DUNE-DAQ / hsilibs

HSI related code.
0 stars 2 forks source link

FakeHSIEventGenerator should not produce subsequent HSIEvents with the same timestamp #4

Open bieryAtFnal opened 2 years ago

bieryAtFnal commented 2 years ago

This has probably come up in other contexts, but the current place where it was noticed was in a N22-09-12 nightly build with Julia's changes in lbrulibs for its integtest/test_pacman-raw.py.

One way to view the problem is that if the data packets arrive in the latency buffer at a lower rate than periodic HSIEvents are generated by the FakeHSIEventGenerator, it is possible to have subsequent HSIEvents that have the same timestamp. This leads to duplicate TriggerDecisions from the MLT with the same trigger_time and same readout window. Clearly, duplicate TriggerDecisions is bad. The reason for subsequent HSIEvents with the same timestamp is the following: the TimeSync messages from the Readout subsystem only get send/updated when new data is found in the latency buffer. if that rate is lower than the configured trigger rate, then the FakeHSIEventGenerator will use the same TimestampEstimator values for subsequent HSIEvents.

In the integration test mentioned above, PACMAN data is sent to the Readout App at approximately 1 Hz. The FakeHSIEventGenerator is configured to generate triggers at 1 Hz. The two 1 Hz rates can get slightly out of sync, and there can be two HSIEvents created from the same TimeSync update.

For reference, here is a post from Phil in July related to this topic (19-July-2022, dunedaq-integration Slack channel)

a possible minor issue with the new run stop method: it looks like the part of the readout that sends timesyncs is stopped before the fake hsi generator is stopped. when no timesyncs are sent, the timestamp estimator doesn't update its timestamp estimate, so the fake hsi generator sends requests for the same timestamp repeatedly. it's only a minor problem because the triggers have already been paused, so the mlt is not sending out TDs for those late hsi events. but it is a problem with the new mlt logic coming in 3.2: with that code, the mlt correctly prints an error for the late hsi TCs, which makes the integration tests fail (they check for no errors in the logs). i think my preferred solution is to fix this at source and have the timestamp estimator continue to update its estimate even when no timesyncs are received