DUNE-DAQ / minidaqapp

0 stars 1 forks source link

Post-2.8.2 confgen fix for multiple separate DQM processes #103

Closed bieryAtFnal closed 2 years ago

bieryAtFnal commented 2 years ago

In testing dunedaq-v2.8.2, Florian noticed that the 2nd, 3rd, etc. DQM processes were reporting errors and not processing events. I tracked this down to additional confgen bugs, and the branch associated with this PR contains the fixes. Juan, I went back to using network endpoints in the range [0..14]. One of the problems that I found was that I hadn't switched to the mode of using [0_0..0_4,10..1..4, etc] everywhere it was needed. Instead of modifying those additional instances to use `{HOSTIDX}{idx}, I just went back to the original model ofnetwork_endpoints[f"datareqdqm{hostidxnumber_of_data_producers+idx`. Another problem was in the naming of the qton_datareqdqm modules in the dqm confgen.

With these changes, I can run multiple DQM processes without seeing errors during a run. Of course, my tests were with emulated data, and further tests would be appreciated.

bieryAtFnal commented 2 years ago

Eric, Might you have time to validate this fix sometime? I think that it is important to get it into the develop branch and then into the NetworkManager2 branch.

To reproduce the problem, we can use

python -m minidaqapp.nanorc.mdapp_multiru_gen --host-ru localhost --host-ru localhost --host-ru localhost -d $PWD/frames.bin -o . -s 10 -n 5 --enable-dqm mdapp_6proc_dqm
tmprun=111; runduration=60; waitAfterStop=2; local_backup log_*; nanorc mdapp_6proc_dqm boot init conf start ${tmprun} wait 2 resume wait ${runduration} pause wait 2 stop wait ${waitAfterStop} scrap terminate

When this is run with the minidaqapp develop branch, there are quite a few errors in the logs (grep ERROR log*). With the code on the kbiery/post282_dqmproc_fix branch, the errors should all be gone. Thanks Kurt

jmcarcell commented 2 years ago

A bit late but just in case I did some testing and it's working fine with multiple processes