Open eflumerf opened 1 week ago
This issue has been seen in multiple contexts, including the automated regression tests https://github.com/DUNE-DAQ/daq-release/actions/runs/11755085773/job/32749783614#step:2:137. Usually self-resolves, but it would be good to understand why it is failing to start correctly.
Just to make sure I understand, there has not been some changes in the configuration? Is the connectivity server being started by the integtest or drunc?
Unfortunately, if any application fails to starts, then the controller in charge of it cannot find it in the connectivity service, so it dies too. Finally the root-controller
looks up that controller on the connectivity service, that fails, and so the root-controller
also dies. In the short term, I suggest putting a ps
straight after boot
in the integtests, to see which applications are not booting correctly. In the long term, I guess the question is what exactly should happen to the control tree when a daq app fails to start correctly. I think what we want here is the controller still starting, but in error.
Additionally if root-controller
fails the Trying to talk to top controller
is not interruptible. Should be a fix similar to interruptible boot