DUNE-DAQ / minidaqapp

0 stars 1 forks source link

Parameters for running DQM with multiple links #72

Closed jmcarcell closed 3 years ago

jmcarcell commented 3 years ago

Goes together with https://github.com/DUNE-DAQ/dqm/pull/51

bieryAtFnal commented 3 years ago

I tried to run a test similar to the one that I mentioned in the earlier PR for DQM config changes...

In today's test, I used the HEAD of the develop branch in many repositories, and I used the jcarcell/Several-links branch in the dqm repo and the jmcarcell/DQM-multiple-links branch in the minidaqapp repo. Here are the commands that I used:

curl -o frames.bin -O https://cernbox.cern.ch/index.php/s/7qNnuxD8igDOVJT/download
python -m minidaqapp.nanorc.mdapp_multiru_gen --host-ru localhost -d $PWD/frames.bin -o . -s 10 --enable-dqm mdapp_4proc_withDQM
nanorc mdapp_4proc_withDQM boot init conf start 101 resume wait 20 pause wait 3 stop wait 2 start 102 resume wait 20 pause wait 3 stop wait 1 start 103 resume wait 20 pause wait 3 stop wait 2 start 104 resume wait 20 pause wait 3 stop wait 1 scrap terminate

I still see messages from the DQM TRB about TriggerDecisions for the wrong run. Here are some logfile messages that seemed unexpected:

2021-Aug-04 13:20:14,389 ERROR [void dunedaq::dfmodules::TriggerRecordBuilder::do_work(std::atomic<bool>&) at /home/biery/dunedaq/04Aug/sourcecode/dfmodules/plugins/TriggerRecordBuilder.cpp:289] Unexpected Trigger Decisions: 15/101 while in run 102
2021-Aug-04 13:20:40,285 LOG [2021-Aug-04 13:20:40,285void dunedaq::dqm::DQMProcessor::RequestMaker() at /home/biery/dunedaq/04Aug/sourcecode/dqm/plugins/DQMProcessor.cpp:189 ERROR ] [DQM: Unable to pop from the data queue
2021-Aug-04 13:21:07,201 ERROR [void dunedaq::dfmodules::TriggerRecordBuilder::do_work(std::atomic<bool>&) at /home/biery/dunedaq/04Aug/sourcecode/dfmodules/plugins/TriggerRecordBuilder.cpp:289] Unexpected Trigger Decisions: 46/103 while in run 104
2021-Aug-04 13:21:08,227 ERROR [bool dunedaq::dfmodules::TriggerRecordBuilder::read_fragments(dunedaq::dfmodules::TriggerRecordBuilder::fragment_sources_t&, bool) at /home/biery/dunedaq/04Aug/sourcecode/dfmodules/plugins/TriggerRecordBuilder.cpp:458] Unexpected Fragment for triggerID 45-0/103, type 1, type: TPC, region: 0, element: 1
bieryAtFnal commented 3 years ago

I'm also seeing problems with multiple runs and multiple Readout Apps.

The following commands don't get past the second run.

curl -o frames.bin -O https://cernbox.cern.ch/index.php/s/7qNnuxD8igDOVJT/download
python -m minidaqapp.nanorc.mdapp_multiru_gen --host-ru localhost -d $PWD/frames.bin -o . -s 10 --enable-dqm mdapp_6proc_withDQM
nanorc mdapp_6proc_withDQM boot init conf start 101 resume wait 20 pause wait 3 stop wait 2 start 102 resume wait 20 pause wait 3 stop wait 1 start 103 resume wait 20 pause wait 3 stop wait 2 start 104 resume wait 20 pause wait 3 stop wait 1 scrap terminate

To be fair, I wonder if this is some sort of system configuration issue. I see message in the TRACE log saying that the TimestampEstimator is way behind (like 10 seconds). Maybe a TimeSync queue isn't being connected to the right module?

jmcarcell commented 3 years ago

On the TimestampEstimator messages, the issue was that TimestampEstimator is not a plugin so it doesn't get the stop command and the only way to stop it is to destroy which now I do in the latest commit of the dqm branch. Then the messages that you were seeing do not appear anymore. I am running

nanorc mdapp_4proc_withDQM boot init conf start 101 resume wait 20 pause wait 3 stop wait 2 start 102 resume wait 20 pause wait 3 stop wait 1 start 103 resume wait 20 pause wait 3 stop wait 2 start 104 resume wait 20 pause wait 3 stop wait 1 scrap terminate

and I'm not seeing any errors but I have seen those you mention before. It's probably related to the start / stop discussion of the TRB since it only happened once after each start / stop. I'm confused about the second part since it's the same commands as in the first one but with the name of the folder changed.

I pushed two more commits to be able to run on pocket more easily, but they are unrelated

jmcarcell commented 3 years ago

All the relevant changes are now in develop in dqm. There are a few opmon variables now, they can be found in the info files by looking for dunedaq.dqm. data_deliveries should be the same as requests and total_data_deliveries should be the same as total_requests and these total ones only increase with time. The commands have not changed. I'll do more testing but on my side this is ready for merging and it's only changes for affecting DQM.