Open tomeichlersmith opened 2 years ago
Supporting a to_awkward read method would help support some current python analyses
I think borrowing (stealing?) some design principles from uproot is beneficial in order to align with some current python analyses (as well as recognizing that uproot is a well designed package).
The main components I see are the following:
The h5py package backing us allows us to avoid the actual disk reading that uproot needs to implement. What this package would need to focus on is the recursive reconstruction of hierarchical data from the "flattened" data that is within the HDF5 file. This will be similar in structure to the h5::Data class I assume.
The function ak.zip
is probably what we want. This would allow us to choose the objects to load and zip them together into the ragged-array style of awkward.
pandas.DataFrame
can simply wrap numpy arrays. Will need to test the performance, but that might work the best.
This is the default return value of h5py
so I doubt anything much heavier is needed.
Goal
I want a user to be able to do the following
This will make
fire
explicitly depend onh5py
, but only at this module level.I'm thinking the implementation of this would be similar to the current Framework's EventTree module while using h5py to access the data sets on disk. Similar to the EventTree module, this would only be designed to read
fire
files. The user could still produce other HDF5 files with direct access toh5py
, but those files will not be standardized in the wayfire
files are. (This is similar to how ROOT-based Python analyses function as well).