cta-observatory / ctapipe

Low-level data processing pipeline software for CTAO or similar arrays of Imaging Atmospheric Cherenkov Telescopes
https://ctapipe.readthedocs.org
BSD 3-Clause "New" or "Revised" License
63 stars 266 forks source link

`get_hdf5_datalevels` seems broken #2450

Closed Tobychev closed 8 months ago

Tobychev commented 8 months ago

Describe the bug The function get_hdf5_datalevels reports no data levels for files that clearly contains data level data

To Reproduce

import ctapipe.io as cio
import pathlib
g_meeting_file_name = pathlib.Path("../2023 Oct F2F/2023-10-datapipe-workshop-material/data/model_training/gamma_20deg_0deg___cta-prod5b-lapalma_desert-2158m-LaPalma-dark_cone10.alpha_train_cl_merged.DL2.h5")
cio.get_hdf5_datalevels(g_meeting_file_name)

returns ()

Expected behavior That it returns at least DataLevel.DL1_PARAMETERS

maxnoe commented 8 months ago

Checking with ctapipe-fileinfo, the same file claims to contain images, which is also wong. Seems the merge file didn't adapt the metadata correctly to the options it was given:

        PRODUCT:
            CREATION:
                TIME: '2023-04-18 06:52:01.624'
            DATA:
                ASSOCIATION: Subarray
                CATEGORY: Sim
                LEVELS: DL1_IMAGES,DL1_PARAMETERS,DL2
maxnoe commented 8 months ago

Ok, so get_hdf5_datalevels actually inspects the groups present in the hdf5 file, not the metadata.

maxnoe commented 8 months ago

Aha! The error here is a missing type check.

get_hdf5_metadata expects a tables.File as argument. In your example, you gave a pathlib.Path, which also happens to have a .root attribute, but which is the empty string, so all checks foo in h5file.root return False:

In [4]: f = tables.open_file("../data/datapipe_workshop/model_training/gamma_20deg_0deg___cta-prod5b-lapalma_desert-2158m-LaPalma-dark_cone10.
   ...: alpha_train_cl_merged.DL2.h5")

In [5]: get_hdf5_datalevels(f)
Out[5]: (<DataLevel.DL1_PARAMETERS: 6>, <DataLevel.DL2: 7>)
Tobychev commented 8 months ago

Aha! I think a secondary issue is how things are presented in the docs, where get_hdf5_metadata comes just after the generic read and write functions that do take just a filename.