CoffeaTeam / coffea

Basic tools and wrappers for enabling not-too-alien syntax when running columnar Collider HEP analysis.
https://coffeateam.github.io/coffea/
BSD 3-Clause "New" or "Revised" License
128 stars 123 forks source link

Form manipulation issues with uproot 5.3.3 #1080

Open nsmith- opened 3 months ago

nsmith- commented 3 months ago

As found in #1076 with uproot 5.3.3 we have a change in the form passed to the form_mapping callable argument in uproot.dask: with v5.3.2 on opening tests/samples/treemaker.root the list of top-level fields in the record array include:

        "Electrons",
        "Electrons/Electrons.fCoordinates.fPt",
        "Electrons/Electrons.fCoordinates.fEta",
        "Electrons/Electrons.fCoordinates.fPhi",
        "Electrons/Electrons.fCoordinates.fE",
        "Electrons_charge",
        "Electrons_iso",
        "Electrons_mediumID",
        "Electrons_MTW",
        "Electrons_passIso",
        "Electrons_tightID",

and in 5.3.3 we have

        "Electrons",
        "Electrons_charge",
        "Electrons_iso",
        "Electrons_mediumID",
        "Electrons_MTW",
        "Electrons_passIso",
        "Electrons_tightID",

it appears that the individual split sub-branches are no longer listed and only the top-level Electrons branch is provided. Similar issues are present for other branches. The current NanoEvents code parses the split branches to build the schema. This sounds like it may have been due to https://github.com/scikit-hep/uproot5/pull/1189

nsmith- commented 3 months ago

If I revert https://github.com/scikit-hep/uproot5/commit/f13730589142bacf528f2f9aefd1d3548540fdca then the coffea tests pass and both the main branch and its subbranches are listed again

nsmith- commented 3 months ago

I am not sure if the desired behavior is to consider the top-level branch and it's sub-branches as "duplicate" or not. I would prefer to only label things duplicate when they truly have the exact same fully-qualified name and point to different data. But open to other opinions. We can either fix this in uproot or rework coffea NanoEvents to build from the record array Electrons branch (and similar for others), assuming it is backwards compatible. FYI @jpivarski @ioanaif

jpivarski commented 3 months ago

Unfortunately, we've been presenting nested branch data as a flat RecordArray of fields for a long time, in arrays and iterate before dask. A RecordArray can have multiple fields with the same names, but it's my understanding that it breaks dask-awkward.

Could we add a disambiguating suffix (e.g. _0, _1, _2) to uproot.dask field names that would have been the same? If this is only an issue after the point when the form-remapping would take effect (what happened before the duplicates-fix?), maybe we can only do it in the case of no form-remapping?

It sounds like the duplicates-removed code state is affecting more people than the with-duplicates code. If so, I should revoke it, release a new version of Uproot, and figure out the right policy afterward.

lgray commented 2 months ago

Is that release, with the change reverted, going to be made?

matthewfeickert commented 2 months ago

This is still an issue in uproot v5.3.4:

$ uv pip install --upgrade -e . 'uproot<=5.3.4' && pytest --cov-report=xml --cov=coffea --deselect=test_taskvine tests/test_nanoevents_delphes.py -k test_listify
   Built file:///home/feickert/Code/GitHub/scikit-hep/coffea                                                                                                                                                Built 1 editable in 419ms
Resolved 64 packages in 34ms
Installed 2 packages in 2ms
 - coffea==2024.4.2.dev13+gb944b41c (from file:///home/feickert/Code/GitHub/scikit-hep/coffea)
 + coffea==2024.4.2.dev13+gb944b41c (from file:///home/feickert/Code/GitHub/scikit-hep/coffea)
 - uproot==5.3.2
 + uproot==5.3.4
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.12.1, pytest-8.2.0, pluggy-1.5.0 -- /home/feickert/.pyenv/versions/3.12.1/envs/coffea-dev/bin/python
cachedir: .pytest_cache
Matplotlib: 3.8.4
Freetype: 2.6.1
rootdir: /home/feickert/Code/GitHub/scikit-hep/coffea
configfile: pyproject.toml
plugins: mpl-0.17.0, cov-5.0.0, typeguard-4.2.1, asyncio-0.23.6, mock-3.14.0
asyncio: mode=Mode.STRICT
collected 70 items / 69 deselected / 1 selected                                                                                                                                                            

tests/test_nanoevents_delphes.py::test_listify ERROR             
jpivarski commented 2 months ago

If scikit-hep/uproot5#1209 fixes the bug, I'll merge it and immediately make a release.

matthewfeickert commented 2 months ago

If scikit-hep/uproot5#1209 fixes the bug, I'll merge it and immediately make a release.

Thanks @jpivarski! That looks like that does it.

$ uv pip install --upgrade git+https://github.com/scikit-hep/uproot5@jpivarski/fixing_ignore_duplicates
$ pytest --cov-report=xml --cov=coffea --deselect=test_taskvine tests/test_nanoevents_delphes.py
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.12.1, pytest-8.2.0, pluggy-1.5.0 -- /home/feickert/.pyenv/versions/3.12.1/envs/coffea-dev/bin/python
cachedir: .pytest_cache
Matplotlib: 3.8.4
Freetype: 2.6.1
rootdir: /home/feickert/Code/GitHub/scikit-hep/coffea
configfile: pyproject.toml
plugins: mpl-0.17.0, cov-5.0.0, typeguard-4.2.1, asyncio-0.23.6, mock-3.14.0
asyncio: mode=Mode.STRICT
collected 70 items                                                                                                                                                                                         

tests/test_nanoevents_delphes.py::test_listify PASSED                                                                                                                                                [  1%]
tests/test_nanoevents_delphes.py::test_collection_exists[CaloJet02] PASSED                                                                                                                           [  2%]
tests/test_nanoevents_delphes.py::test_collection_exists[CaloJet04] PASSED                                                                                                                           [  4%]
tests/test_nanoevents_delphes.py::test_collection_exists[CaloJet08] PASSED                                                                                                                           [  5%]
tests/test_nanoevents_delphes.py::test_collection_exists[CaloJet15] PASSED                                                                                                                           [  7%]
tests/test_nanoevents_delphes.py::test_collection_exists[EFlowNeutralHadron] PASSED                                                                                                                  [  8%]
tests/test_nanoevents_delphes.py::test_collection_exists[EFlowPhoton] PASSED                                                                                                                         [ 10%]
tests/test_nanoevents_delphes.py::test_collection_exists[EFlowTrack] PASSED                                                                                                                          [ 11%]
tests/test_nanoevents_delphes.py::test_collection_exists[Electron] PASSED                                                                                                                            [ 12%]
tests/test_nanoevents_delphes.py::test_collection_exists[Event] PASSED                                                                                                                               [ 14%]
tests/test_nanoevents_delphes.py::test_collection_exists[EventLHEF] PASSED                                                                                                                           [ 15%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet] PASSED                                                                                                                              [ 17%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet02] PASSED                                                                                                                            [ 18%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet04] PASSED                                                                                                                            [ 20%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet08] PASSED                                                                                                                            [ 21%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet15] PASSED                                                                                                                            [ 22%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenMissingET] PASSED                                                                                                                        [ 24%]
tests/test_nanoevents_delphes.py::test_collection_exists[Jet] PASSED                                                                                                                                 [ 25%]
tests/test_nanoevents_delphes.py::test_collection_exists[MissingET] PASSED                                                                                                                           [ 27%]
tests/test_nanoevents_delphes.py::test_collection_exists[Muon] PASSED                                                                                                                                [ 28%]
tests/test_nanoevents_delphes.py::test_collection_exists[Particle] PASSED                                                                                                                            [ 30%]
tests/test_nanoevents_delphes.py::test_collection_exists[ParticleFlowJet02] PASSED                                                                                                                   [ 31%]
tests/test_nanoevents_delphes.py::test_collection_exists[ParticleFlowJet04] PASSED                                                                                                                   [ 32%]
tests/test_nanoevents_delphes.py::test_collection_exists[ParticleFlowJet08] PASSED                                                                                                                   [ 34%]
tests/test_nanoevents_delphes.py::test_collection_exists[ParticleFlowJet15] PASSED                                                                                                                   [ 35%]
tests/test_nanoevents_delphes.py::test_collection_exists[Photon] PASSED                                                                                                                              [ 37%]
tests/test_nanoevents_delphes.py::test_collection_exists[ScalarHT] PASSED                                                                                                                            [ 38%]
tests/test_nanoevents_delphes.py::test_collection_exists[Tower] PASSED                                                                                                                               [ 40%]
tests/test_nanoevents_delphes.py::test_collection_exists[Track] PASSED                                                                                                                               [ 41%]
tests/test_nanoevents_delphes.py::test_collection_exists[TrackJet02] PASSED                                                                                                                          [ 42%]
tests/test_nanoevents_delphes.py::test_collection_exists[TrackJet04] PASSED                                                                                                                          [ 44%]
tests/test_nanoevents_delphes.py::test_collection_exists[TrackJet08] PASSED                                                                                                                          [ 45%]
tests/test_nanoevents_delphes.py::test_collection_exists[TrackJet15] PASSED                                                                                                                          [ 47%]
tests/test_nanoevents_delphes.py::test_collection_exists[WeightLHEF] PASSED                                                                                                                          [ 48%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[CaloJet02] PASSED                                                                                                                       [ 50%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[CaloJet04] PASSED                                                                                                                       [ 51%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[CaloJet08] PASSED                                                                                                                       [ 52%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[CaloJet15] PASSED                                                                                                                       [ 54%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet] PASSED                                                                                                                          [ 55%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet02] PASSED                                                                                                                        [ 57%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet04] PASSED                                                                                                                        [ 58%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet08] PASSED                                                                                                                        [ 60%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet15] PASSED                                                                                                                        [ 61%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[Jet] PASSED                                                                                                                             [ 62%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[ParticleFlowJet02] PASSED                                                                                                               [ 64%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[ParticleFlowJet04] PASSED                                                                                                               [ 65%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[ParticleFlowJet08] PASSED                                                                                                               [ 67%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[ParticleFlowJet15] PASSED                                                                                                               [ 68%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[TrackJet02] PASSED                                                                                                                      [ 70%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[TrackJet04] PASSED                                                                                                                      [ 71%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[TrackJet08] PASSED                                                                                                                      [ 72%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[TrackJet15] PASSED                                                                                                                      [ 74%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[CaloJet02] PASSED                                                                                                                [ 75%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[CaloJet04] PASSED                                                                                                                [ 77%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[CaloJet08] PASSED                                                                                                                [ 78%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[CaloJet15] PASSED                                                                                                                [ 80%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet] PASSED                                                                                                                   [ 81%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet02] PASSED                                                                                                                 [ 82%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet04] PASSED                                                                                                                 [ 84%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet08] PASSED                                                                                                                 [ 85%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet15] PASSED                                                                                                                 [ 87%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[Jet] PASSED                                                                                                                      [ 88%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[ParticleFlowJet02] PASSED                                                                                                        [ 90%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[ParticleFlowJet04] PASSED                                                                                                        [ 91%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[ParticleFlowJet08] PASSED                                                                                                        [ 92%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[ParticleFlowJet15] PASSED                                                                                                        [ 94%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[TrackJet02] PASSED                                                                                                               [ 95%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[TrackJet04] PASSED                                                                                                               [ 97%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[TrackJet08] PASSED                                                                                                               [ 98%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[TrackJet15] PASSED 

Though I can run on CI to double check. I'll report on https://github.com/scikit-hep/uproot5/pull/1209 once that passes.

matthewfeickert commented 2 months ago

Yup, that did it: https://github.com/scikit-hep/uproot5/pull/1209#pullrequestreview-2044483928