Mu2e / otsdaq_mu2e

Mu2e customizations for otsdaq
Other
1 stars 5 forks source link

Bad Omen ? #60

Open pavel1murat opened 8 months ago

pavel1murat commented 8 months ago

I'm running an artdaq job with the boardreader getNext function returning two artdaq fragments , both of fixed length, one with the data payload of 4 bytes and another one with the data payload of about 50*8 bytes.

About 30 min into the data taking run, artdaq starts complaining about something it calls "Bad Omen" - see below.

How does one deal with that ?

I'm using the v2_07_00_e28_s126 ots bundle compiled in debug mode, the full boardreader FCL is attached below.

-- thanks, Pasha

%MSG-i tracker_vst_BoardReaderCore:  Early 19-Dec-2023 08:27:54 CST pre-events BoardReaderCore.cc:226
Completed the Start transition (Started run) for run 105186, timeout = 600, timestamp = 18446744073709551615
%MSG
%MSG-i tracker_vst_CommandableInterface:  Early 19-Dec-2023 08:27:54 CST  pre-events Commandable.cc:89
Start transition complete
%MSG
%MSG-i tracker_vst_FragmentBuffer:  Early 19-Dec-2023 08:28:01 CST pre-events FragmentBuffer.cc:846
Copying 1 System Fragments into output
%MSG
%MSG-i tracker_vst_FragmentBuffer:  Early 19-Dec-2023 08:28:08 CST pre-events FragmentBuffer.cc:846
Copying 1 System Fragments into output
%MSG

..... snip...

%MSG-i tracker_vst_BoardReaderCore:  Early 19-Dec-2023 09:01:55 CST pre-events BoardReaderCore.cc:544
tracker_vst statistics:
  Fragments read: 170407 fragments generated at 1419.85 getNext calls/sec, fragment rate = 2839.85 fragments/sec, monitor window = 60.0056 sec, min::max read size = 2::3 fragments  Average times per fragment:  elapsed time = 0.000704299 sec
  Fragment output statistics: 170407 fragments sent at 2839.85 fragments/sec, effective data rate = 0.704125 MB/sec, monitor window = 60.0056 sec, min::max event size = 3.8147e-05::0.000457764 MB
  Input wait time = 0.000671079 s/fragment, buffer wait time = 3.06895e-05 s/fragment, request wait time = 0.00031376 s/fragment, output wait time = 2.91261e-05 s/fragment
%MSG
%MSG-w tracker_vst_FragmentBuffer:  Early 19-Dec-2023 09:02:01 CST pre-events FragmentBuffer.cc:292
Bad Omen: Data Buffer has exceeded its size limits. (seq_id=2908885, frag_id=11, frags=1001/1000, szB=480480/1048576000), timestamps=2908885-2909885
%MSG

here is the full tracker_vst.fcl : tracker_vst_fcl.txt

pavel1murat commented 8 months ago

the problem is reproducible, starts 35 min into the run

------------------------------------------------------------------------------
%MSG-i tracker_vst_BoardReaderCore:  Early 20-Dec-2023 12:23:28 CST pre-events BoardReaderCore.cc:226
Completed the Start transition (Started run) for run 105187, timeout = 600, timestamp = 18446744073709551615
%MSG
%MSG-i tracker_vst_CommandableInterface:  Early 20-Dec-2023 12:23:28 CST  pre-events Commandable.cc:89
Start transition complete
%MSG
%MSG-i tracker_vst_FragmentBuffer:  Early 20-Dec-2023 12:23:35 CST pre-events FragmentBuffer.cc:846
Copying 1 System Fragments into output
%MSG
... snip...
%MSG-i tracker_vst_FragmentBuffer:  Early 20-Dec-2023 12:58:38 CST pre-events FragmentBuffer.cc:846
Copying 1 System Fragments into output
%MSG
%MSG-w tracker_vst_FragmentBuffer:  Early 20-Dec-2023 12:58:41 CST pre-events FragmentBuffer.cc:292
Bad Omen: Data Buffer has exceeded its size limits. (seq_id=2973615, frag_id=11, frags=1001/1000, szB=480480/1048576000), timestamps=2973615-2974615
%MSG
.. snip...
------------------------------------------------------------------------------
pavel1murat commented 8 months ago

it looks that this diagnostics happens when artdaq loses events in communication between the processes