DUNE-DAQ / drunc

Dune RUN Control (DRUNC) is the run control for the DUNE experiment
1 stars 0 forks source link

Lots of "ConfigClient.cpp:120] Failed to lookup <connection_name>" errors in logfiles #227

Open bieryAtFnal opened 2 months ago

bieryAtFnal commented 2 months ago

I'm not sure that this is the best repo to file this Issue, but I'll start here, and we can move it around if we need to.

I noticed these errors when running a small sample system with the 27-Aug nightly build.

The error messages are in the daq_application logs, and they look like the following:

log_biery_test-session_dfo-01.log:2024-Aug-27 09:25:56,781 ERROR [dunedaq::iomanager::ConnectionResponse dunedaq::iomanager::ConfigClient::resolveConnection(const dunedaq::iomanager::ConnectionRequest&, std::string) at /tmp/root/spack-stage/spack-stage-iomanager-NB_DEV_240827_A9-3ixqk73hwmn33vffzaqkxtqflb2y4os2/spack-src/src/network/ConfigClient.cpp:120] Failed to lookup trigger_inhibit at /getconnection/test-session connect: Connection refused

I believe that the errors all happen in a short time period at the start of the run, so a guess is that the ConnectivityServer needs to be given a little time to start up.

bieryAtFnal commented 2 months ago

Here are sample steps to demonstrate these messages:

DATE_PREFIX=`date '+%d%b'`
TIME_SUFFIX=`date '+%H%M'`

source /cvmfs/dunedaq.opensciencegrid.org/setup_dunedaq.sh
setup_dbt latest_v5
dbt-create -n NFD_DEV_240827_A9 ${DATE_PREFIX}FDv5Test_${TIME_SUFFIX}
cd ${DATE_PREFIX}FDv5Test_${TIME_SUFFIX}/sourcecode

git clone https://github.com/DUNE-DAQ/appmodel.git -b glm/action_plans
cd ..

dbt-workarea-env
dbt-build -j 12
dbt-workarea-env

mkdir rundir
cd rundir

# Execute the following commands by hand:

killall drunc-controller
drunc-unified-shell ssh-standalone

# within drunc

boot test/config/test-session.data.xml test-session
fsm conf
fsm start run_number 101
fsm enable_triggers
# wait for a few seconds
fsm disable_triggers
fsm drain_dataflow
fsm stop_trigger_sources
fsm stop
fsm scrap
exit

# run the following command to see the error messages
egrep 'ERROR|WARNING' log_* | grep 'Connection refused'