cta-observatory / cta-lstchain

LST prototype testbench chain
https://cta-observatory.github.io/cta-lstchain/
BSD 3-Clause "New" or "Revised" License
25 stars 77 forks source link

Empty events in the last files of the runs #684

Open morcuended opened 3 years ago

morcuended commented 3 years ago

Just for the record. Sometimes the r0 to dl1 step fails for the last files of given runs. This error happens for example in the last two subruns 170 and 171 of run 4343 from 2021-04-08.

@maxnoe checked this some time ago showing that those events are filled with zeros. The first 4544 events have waveform with length 74200 (one gain), only the last 25 have 148400.

CameraEvent(
    configuration_id=0
    event_id=0
    tel_event_id=0
    trigger_time_s=0
    trigger_time_qns=0
    trigger_type=0
    waveform=array([0, 0, ..., 0, 0], dtype=uint16)
    pixel_status=array([0, 0, ..., 0, 0], dtype=uint8)
    ped_id=0
    nectarcam=NectarCamEvent(
        module_status=array([], dtype=float64)
        extdevices_presence=0
        tib_data=array([], dtype=float64)
        cdts_data=array([], dtype=float64)
        swat_data=array([], dtype=float64)
        counters=array([], dtype=float64))
    lstcam=LstCamEvent(
        module_status=array([0, 0, ..., 0, 0], dtype=uint8)
        extdevices_presence=0
        tib_data=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)
        cdts_data=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=uint8)
        swat_data=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
               0, 0, 0, 0, 0], dtype=uint8)
        counters=array([0, 0, ..., 0, 0], dtype=uint8)
        chips_flags=array([0, 0, ..., 0, 0], dtype=uint16)
        first_capacitor_id=array([0, 0, ..., 0, 0], dtype=uint16)
        drs_tag_status=array([0, 0, ..., 0, 0], dtype=uint8)
        drs_tag=array([0, 0, ..., 0, 0], dtype=uint16))
    digicam=DigiCamEvent(
        ))

The problem is probably related to the process of stopping the run during data taking. But it's getting more and more common.

Right now the solution is to discard those runs after seeing that they fail with this error.

Command

lstchain_data_r0_to_dl1 \
  --input-file=/fefs/aswg/data/real/R0/20210408/LST-1.1.Run04343.0170.fits.fz \
  --output-dir=/fefs/aswg/data/real/running_analysis/20210408/v0.7.1 \
  --pedestal-file=/fefs/aswg/data/real/running_analysis/20210408/v0.7.1/drs4_pedestal.Run04340.0000.fits \
  --calibration-file=/fefs/aswg/data/real/running_analysis/20210408/v0.7.1/calibration.Run04341.0000.hdf5 \
  --time-calibration-file=/fefs/aswg/data/real/running_analysis/20210408/v0.7.1/time_calibration.Run04341.0000.hdf5 \
  --pointing-file=/fefs/aswg/data/real/monitoring/DrivePositioning/drive_log_21_04_08.txt \
  --run-summary-path=/fefs/aswg/data/real/monitoring/RunSummary/RunSummary_20210408.ecsv

Error:

Found duplicated column obs_id, skipping
Found duplicated column event_id, skipping
Traceback (most recent call last):
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa-env/lib/python3.7/site-packages/ctapipe_io_lst/__init__.py", line 615, in fill_r0_camera_container
    self.camera_config.num_samples
ValueError: cannot reshape array of size 74200 into shape (2,1855,40)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa-env/bin/lstchain_data_r0_to_dl1", line 8, in <module>
    sys.exit(main())
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa-env/lib/python3.7/site-packages/lstchain/scripts/lstchain_data_r0_to_dl1.py", line 185, in main
    custom_config=config,
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa-env/lib/python3.7/site-packages/lstchain/reco/r0_to_dl1.py", line 305, in r0_to_dl1
    for i, event in enumerate(source):
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa-env/lib/python3.7/site-packages/ctapipe/io/eventsource.py", line 274, in __iter__
    for event in self._generator():
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa-env/lib/python3.7/site-packages/ctapipe_io_lst/__init__.py", line 334, in _generator
    self.fill_r0_container(array_event, zfits_event)
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa-env/lib/python3.7/site-packages/ctapipe_io_lst/__init__.py", line 642, in fill_r0_container
    zfits_event
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa-env/lib/python3.7/site-packages/ctapipe_io_lst/__init__.py", line 619, in fill_r0_camera_container
    f"Number of gains not correct, waveform shape is {zfits_event.waveform.shape[0]}"
ValueError: Number of gains not correct, waveform shape is 74200 instead of 148400
maxnoe commented 3 years ago

But it's getting more and more common.

Then we should also open an issue with the DAQ people, right? This should not happen in the first place.

maxnoe commented 3 years ago

A simple check would be event_id == 0, since that starts at 1.

maxnoe commented 3 years ago

The question is should we stop processing at the first of such "empty" events or just skip them?

morcuended commented 3 years ago

But it's getting more and more common.

Then we should also open an issue with the DAQ people, right? This should not happen in the first place.

I will open a ticket in Redmine.

The question is should we stop processing at the first of such "empty" events or just skip them?

Right now I simply discard the whole file even if there are also good events. There is no other way to do it right now.

maxnoe commented 3 years ago

Are there good events after the empty ones? Or are the empty ones always at the back?

morcuended commented 3 years ago

No idea. Did not check that.

maxnoe commented 3 years ago

Can you try with that? https://github.com/cta-observatory/ctapipe_io_lst/pull/99

morcuended commented 3 years ago

Let me try

morcuended commented 3 years ago

Redmine page: https://forge.in2p3.fr/issues/44538

morcuended commented 3 years ago

@maxnoe, the fix in cta-observatory/ctapipe_io_lst#99 worked fine in those two files.

maxnoe commented 3 years ago

Ok, then let's merge that and make a new release for ctapipe_io_lst.

maxnoe commented 3 years ago

Ah, already done by @rlopezcoto, great!

morcuended commented 3 years ago

Although it worked almost all the time. I spotted a different error related to this in run 04135.0157 (last file of the run):

...
Event with event_id=0 found, skipping
Event with event_id=0 found, skipping
Traceback (most recent call last):
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa/bin/lstchain_data_r0_to_dl1", line 8, in <module>
    sys.exit(main())
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa/lib/python3.7/site-packages/lstchain/scripts/lstchain_data_r0_to_dl1.py", line 185, in main
    custom_config=config,
  File "/fefs/aswg/software/virtual_env/anaconda3/envs/osa/lib/python3.7/site-packages/lstchain/reco/r0_to_dl1.py", line 575, in r0_to_dl1
    new_ped, new_ff = calibration_calculator.output_interleaved_results(event)
UnboundLocalError: local variable 'event' referenced before assignment
maxnoe commented 3 years ago

That happens when there are no events at all.

maxnoe commented 3 years ago

The code is very problematic. Mixing calculation with storing it in the event with the IO at the end. I would try to fix that, but for the moment, I would only check that at least one event has been processed.