FAST-HEP / fast-carpenter

Helping turn your trees into tables (ie. reads ROOT TTrees, writes summary Pandas DataFrames)
https://fast-hep.web.cern.ch
Other
9 stars 14 forks source link

Towards 0.21.0: uproot4, awkward1 and Python >= 3.7 #141

Closed kreczko closed 2 years ago

kreczko commented 3 years ago

Creating a dedicated issue for these breaking changes.

Python versions

The last three minor release versions are 3.9, 3.8, 3.7. Given that packages like Parsl require version >= 3.6, should a similar range apply here? For some reason the rule of 3 comes to mind, so supporting >= 3.7

uproot4

Multitree support still feels a bit odd, but lazy loading (with cache) is doable if you know the trees in advance (we do thanks to the configs). By default all keys are now strings, not bytestrings (yay!) - another reason to move

awkward1

This is the real major change and a risk at the moment. Not all numpy operations are available, which might make work with numexpr challenging. However, Jim is, as always, open to discussions and missing features can be requested or implemented by us. The big advantage of doing this well is access to GPUs!

The other interesting bit here is the new zip/unzip functionality. Essentially you can create a Tree/EventModel for you input data and define behaviors for it, e .g. https://github.com/FAST-HEP/scikit-validate/blob/0.4.0-dev/skvalidate/operations/_awkward.py#L4

@benkrikler if you are happy with the target version, I will start work during spare time. It would be good to merge the multi-tree support soon as well ;).

kreczko commented 3 years ago

Summary from first look yesterday:

kreczko commented 3 years ago

For TLorentzVector added https://github.com/scikit-hep/vector/pull/19.

Next is fast_curator with two blocking issues:

  1. https://github.com/scikit-hep/uproot4/issues/197 (currently no uproot4.num_entries)
  2. The CMS file_list.yml cannot be read:
    yaml.constructor.ConstructorError: could not determine a constructor for the tag 'tag:yaml.org,2002:python/long'
    in "file_list.yml", line 6, column 14

The latter can be solved by using yaml.load(.., Loader=yaml.Loader). SafeLoader does not work for some reason