Closed bieryAtFnal closed 1 year ago
It looks like the problem is coming from the WIB2TPHandler as part of the SWTPG. I will have a look and try to debug the issue
By default, the software_tpg_threshold
is set to 100. In case this value is too low, the number of TPs produced may be too much for the receiving sub-system. I have not seen that error from the TPHandler
with the following config file (note that I turned off the connectivity service because it was not working for me in this simple setup)
{
"boot": {
"use_connectivity_service": false,
"start_connectivity_service": false
},
"dataflow": {
"apps": [
{ "app_name": "dataflow0" },
{ "app_name": "dataflow1" }
]
},
"readout": {
"enable_software_tpg": true,
"software_tpg_threshold": 500,
"clock_speed_hz": 62500000,
"data_rate_slowdown_factor": 10,
"data_files": [
{"detector_id": 3, "data_file": "asset://?label=DuneWIB&subsystem=readout"}
]
},
"trigger": {
"enable_tpset_writing": true,
"trigger_activity_config": {"prescale":1000},
"trigger_window_before_ticks": 1000,
"trigger_window_after_ticks": 1000,
"trigger_rate_hz": 1.0
}
}
I also looked into the output tpstream file to check if the values for channels, timestamp and adc looked reasonable. It seems that they are. Let me know if this works fine for you as well.
Thanks, Adam, I've confirmed that the higher software_tpg_threshold eliminates those error messages.
Independent of that, I'd like to ask your advice on a different issue that happens when I stop and start multiple runs in the same DAQ session, using the daqconf.json file that you sent.
When I do that, I see messages like the following:
WARNING [void dunedaq::fdreadoutlibs::WIB2TPHandler::try_sending_tpsets(uint64_t) at /home/nfs/dunedaq/daqsw/04AprV4.0.0rc1Testing/sourcecode/fdreadoutlibs/include/fdreadoutlibs/wib2/WIB2TPHandler.hpp:96] Continuity of timestamps broken.
I used a command like the following for this latest test:
nanorc --partition-number 3 mdapp_adam/ ${USER}-test boot conf start_run 1111 wait 20 stop_run wait 2 start_run 1112 wait 20 stop_run wait 2 start_run 1113 wait 20 stop_run scrap terminate
The Warning messages seem to appear after the first run has stopped, and they continue throughout the second and third runs.
Any ideas? Thanks
That warning message originates from the fact that during the coldobx runs we noticed that TPSets were arriving in the trigger out of order. The fix for that problem was to drop the TPSets that are older than a certain time after having increased the wait time for producing TPSets. We added that warning to know when this condition was happening. Having said that, the question here is what is happening after you stop the (first) run, probably needs some investigation.
I also agree that in the future it is best to increase the counter of dropped TPSets if we fall in that condition.
I'm not sure whether I should file this Issue in this repo (fdreadoutlibs) or in the daqconf repo, but what I observe is complaints in the RU log files about a failure to write to the
m_tp_sink
queue when using WIB2 emulated data.Here is the hw_map.txt file that I'm using
Here is the daqconf.json file that I'm using:
Here are the steps that I used to demonstrate the problem:
The logfile grep shows messages like the following: