cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
64 stars 268 forks source link

HDF5EventSource.__len__ is wrong if `allowed_tels` is given #2324

Open maxnoe opened 1 year ago

maxnoe commented 1 year ago

Describe the bug

When allowed_tels is set for HDF5EventSource, events with no telescopes remaining in the selection are skipped, thus changing the total number of array events.

Currently, __len__ simply looks at the number of array events in the input table, thus resulting in a too large number if allowed_tels is given.

To Reproduce

Steps to reproduce the behavior:

Run ctapipe-process -i dataset://gamma_prod5.simtel.zst -o /tmp/test.dl1.h5

❯ ctapipe-process -i dataset://gamma_prod5.simtel.zst -o /tmp/test.dl1.h5 --progress
2023-05-09 17:33:13,462 WARNING [ctapipe.core.telescope_component] (telescope_component.attach_subarray): TelescopeParameter type argument 'SST_1M_*' did not match any known telescope types
SimTelEventSource: 7ev [00:06,  1.07ev/s]
❯ ctapipe-process -i /tmp/test.dl1.h5 -o /tmp/test2.dl1.h5 --EventSource.allowed_tels=1 --EventSource.allowed_tels=2 --EventSource.allowed_tels=3 --EventSource.allowed_tels=4 --progress
2023-05-09 17:34:38,508 WARNING [ctapipe.core.telescope_component] (telescope_component.attach_subarray): TelescopeParameter type argument 'MST_*' did not match any known telescope types
2023-05-09 17:34:38,508 WARNING [ctapipe.core.telescope_component] (telescope_component.attach_subarray): TelescopeParameter type argument 'SST_1M_*' did not match any known telescope types
HDF5EventSource:  14%|███████▋                                              | 1/7 [00:01<00:09,  1.56s/ev]

Observe that the progress bar thinks there will be 7 events but only 1 is actually encountered.

Expected behavior __len__ provides the correct length

maxnoe commented 1 year ago

There are I think three possible solutions:

1) Keep the current behavior of the source but remove __len__, making it a pure iterator with an unknown a-priori length 2) Keep the current, simple length implementation but yield also "empty" events (that could then be taken care of by the Softwaretrigger) 3) Implement a more complex __len__ that looks at the full trigger table if allowed_tels are given

kosack commented 1 year ago

I like having a __len__ attribute, even if it is not totally reliable, as it still gives useful information. So I'd prefer either keeping it the way it is (in which case the progress bar is an upper limit, I guess, but still somewhat useful), or implement suggestion 2 or 3.