DUNE-DAQ / ehn1-operations-issues

Non-code repo used specifically for keeping track of DAQ operations-related issues
0 stars 0 forks source link

Occassional warnings of overlap with previous window #20

Closed wesketchum closed 8 months ago

wesketchum commented 8 months ago

Running at a 10Hz rate with CTB as HSI, Seeing occasional warning from the trigger app:

TC of type 5, timestamp 106738818425514731 overlaps with previous TD readout window: [106738818425514731, 106738818425514763] DAQModule: mlt
wesketchum commented 8 months ago

This is also occurring with a fake HSI at 10Hz.

wesketchum commented 8 months ago

But, not seeing it at ~1 Hz rate. Or certainly not with a frequency I'd be patient enough to see in quick testing.

wesketchum commented 8 months ago

OK, some more details here.

This was done using the /nfs/sw/dunedaq/dunedaq-fddaq-v4.3.0-rc1-dev area on the EHN1 cluster. Configs for that are located in np04daq-configs/DAQ_CONFS/np04_daq_WIB_conf inside that directory. The trigger configuration for that is attached. trigger_conf.json

(global config is global_configs/np04_WIB.json)

MRiganSUSX commented 8 months ago

Hi Wes,

As far as I can see, your configuration /nfs/sw/dunedaq/dunedaq-fddaq-v4.3.0-rc1-dev/np04daq-configs/DAQ_CONFS/np04_daq_WIB_conf/config/np04_daq.json :

Explanation of the error: MLT buffers each decision (which is pending at this point) for some short time, in case new TCs arrive, so that they could be merged with this TD (if merging is on). After this short wait, the TD is prepared and sent. A vector of sent TDs is kept. When new TC comes, it is checked whether its readout window overlaps with current pending TDs BUT ALSO whether it overlaps with the previous, already sent, TD. This later check will ONLY happen if this TC arrived late: ie it arrived after the buffer wait time for the previous TD which this new TC overlaps (otherwise it would have simply been added to the previous TD thanks to merging). The main purpose of this check is to monitor for these late TCs.

In your case: you have a TC of type 5, which is Prescale, that arrived late, and overlapped previous TD. I believe this warning is actual feature and works as expected. You are either unlucky in the sense that a TC made from TPGen TPs overlapped previous random trigger, or it overlapped previous Prescale TPGen trigger because the rates are high. Not an issue with CTB / HSI trigger per se; but there is definitely an issue with a TC coming later than expected to MLT.

Options:

wesketchum commented 8 months ago

OK, thanks @MRiganSUSX . I think actually the main issue that I had was that I thought I was disabling the prescale algorithm, but clearly I was not.

I was able to configure this by adding with

"mlt_ignore_tc": [0,2,3,4,5,6,7,8,9]

and I see that there are no more warnings.

Still ... I suspect that there is still something funny going on. From the warning above, it's far too convenient that the timestamp it complains about 106738818425514731 matches exactly with beginning of the previous TD readout window. It's always this way, which I certainly would not expect for prescales coming from random TPs. I also see a warning now when configuring the CTB like:

 Mesage from CTB: "HLT_4 has prescale 0x0. Minimum is 0x1 (no prescale). Readjusting."

... is there a prescale being applied to TCs that come from the HSI or CTB? If so, is that causing this overlap?

wesketchum commented 8 months ago

Just to note: when I have the mlt_ignore_tc, then I no longer see the warning messages, so that's good, and I think that's a reasonable solution. But I'd still like to understand if TCs from the CTB/HSI path are also getting prescaled with TCs created, and how we could disable that for these types. I'm actually wondering if this is due to the new feature that was added to allow for prescaling on these (which was very appreciated and useful for calibration runs...)

MRiganSUSX commented 8 months ago

Hi, so, If I understand the system correctly, the CTB signals come via the same route as all HSI signals, ie they go through the TimingTriggerCandidateMaker. Therefore, the Prescale that you apply for XAMaker or XCMaker is not applied to these (different path). Here is a pic: trigger However, as you mentioned, the timing prescale would apply to these, so "ttcm_prescale": 1, would apply to these. Your is set to 1, which means no prescaling is taking place and every single HSI event creates a TC. I should note this: I do not know/understand how the hsi_events themselves are created, I can only confirm that the TimingTCM does not distinguish between different types. So if it is possible to send both timing HSI and CTB HSI they would all go to TTCM, and would be counted against the same prescale counter - not that this affects the timestamps, just a separate note. For the message you are referring to Mesage from CTB: this comes from the ctbmodules repo, before the trigger - I have no expertise on this.

Re the timestamps, if you have merging turned off OR you are ignoring everything else but timing TCs; AND you know that they are at reasonable rate where no overlap is expected (say 10Hz, equally spaced events): the timestamps should not be modified at the MLT level. I would look into the timestamps of the arriving HSI events to see they make sense. You can achieve this by increasing the debug level to 3: https://github.com/DUNE-DAQ/trigger/blob/develop/plugins/TimingTriggerCandidateMaker.cpp#L127 If these look OK, you can look at what's happening in the MLT: Receiving TCs, debug 1: https://github.com/DUNE-DAQ/trigger/blob/develop/plugins/ModuleLevelTrigger.cpp#L337 Sending TD, debug 3: https://github.com/DUNE-DAQ/trigger/blob/develop/plugins/ModuleLevelTrigger.cpp#L455

wesketchum commented 8 months ago

Conditions here are understood, and disabling trigger prescale types in the MLT prevents the warning messages to appear. Closing.