Open nsmith- opened 3 months ago
If I revert https://github.com/scikit-hep/uproot5/commit/f13730589142bacf528f2f9aefd1d3548540fdca then the coffea tests pass and both the main branch and its subbranches are listed again
I am not sure if the desired behavior is to consider the top-level branch and it's sub-branches as "duplicate" or not. I would prefer to only label things duplicate when they truly have the exact same fully-qualified name and point to different data. But open to other opinions. We can either fix this in uproot or rework coffea NanoEvents to build from the record array Electrons
branch (and similar for others), assuming it is backwards compatible.
FYI @jpivarski @ioanaif
Unfortunately, we've been presenting nested branch data as a flat RecordArray of fields for a long time, in arrays
and iterate
before dask
. A RecordArray can have multiple fields with the same names, but it's my understanding that it breaks dask-awkward.
Could we add a disambiguating suffix (e.g. _0
, _1
, _2
) to uproot.dask
field names that would have been the same? If this is only an issue after the point when the form-remapping would take effect (what happened before the duplicates-fix?), maybe we can only do it in the case of no form-remapping?
It sounds like the duplicates-removed code state is affecting more people than the with-duplicates code. If so, I should revoke it, release a new version of Uproot, and figure out the right policy afterward.
Is that release, with the change reverted, going to be made?
This is still an issue in uproot
v5.3.4
:
$ uv pip install --upgrade -e . 'uproot<=5.3.4' && pytest --cov-report=xml --cov=coffea --deselect=test_taskvine tests/test_nanoevents_delphes.py -k test_listify
Built file:///home/feickert/Code/GitHub/scikit-hep/coffea Built 1 editable in 419ms
Resolved 64 packages in 34ms
Installed 2 packages in 2ms
- coffea==2024.4.2.dev13+gb944b41c (from file:///home/feickert/Code/GitHub/scikit-hep/coffea)
+ coffea==2024.4.2.dev13+gb944b41c (from file:///home/feickert/Code/GitHub/scikit-hep/coffea)
- uproot==5.3.2
+ uproot==5.3.4
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.12.1, pytest-8.2.0, pluggy-1.5.0 -- /home/feickert/.pyenv/versions/3.12.1/envs/coffea-dev/bin/python
cachedir: .pytest_cache
Matplotlib: 3.8.4
Freetype: 2.6.1
rootdir: /home/feickert/Code/GitHub/scikit-hep/coffea
configfile: pyproject.toml
plugins: mpl-0.17.0, cov-5.0.0, typeguard-4.2.1, asyncio-0.23.6, mock-3.14.0
asyncio: mode=Mode.STRICT
collected 70 items / 69 deselected / 1 selected
tests/test_nanoevents_delphes.py::test_listify ERROR
If scikit-hep/uproot5#1209 fixes the bug, I'll merge it and immediately make a release.
If scikit-hep/uproot5#1209 fixes the bug, I'll merge it and immediately make a release.
Thanks @jpivarski! That looks like that does it.
$ uv pip install --upgrade git+https://github.com/scikit-hep/uproot5@jpivarski/fixing_ignore_duplicates
$ pytest --cov-report=xml --cov=coffea --deselect=test_taskvine tests/test_nanoevents_delphes.py
=========================================================================================== test session starts ============================================================================================
platform linux -- Python 3.12.1, pytest-8.2.0, pluggy-1.5.0 -- /home/feickert/.pyenv/versions/3.12.1/envs/coffea-dev/bin/python
cachedir: .pytest_cache
Matplotlib: 3.8.4
Freetype: 2.6.1
rootdir: /home/feickert/Code/GitHub/scikit-hep/coffea
configfile: pyproject.toml
plugins: mpl-0.17.0, cov-5.0.0, typeguard-4.2.1, asyncio-0.23.6, mock-3.14.0
asyncio: mode=Mode.STRICT
collected 70 items
tests/test_nanoevents_delphes.py::test_listify PASSED [ 1%]
tests/test_nanoevents_delphes.py::test_collection_exists[CaloJet02] PASSED [ 2%]
tests/test_nanoevents_delphes.py::test_collection_exists[CaloJet04] PASSED [ 4%]
tests/test_nanoevents_delphes.py::test_collection_exists[CaloJet08] PASSED [ 5%]
tests/test_nanoevents_delphes.py::test_collection_exists[CaloJet15] PASSED [ 7%]
tests/test_nanoevents_delphes.py::test_collection_exists[EFlowNeutralHadron] PASSED [ 8%]
tests/test_nanoevents_delphes.py::test_collection_exists[EFlowPhoton] PASSED [ 10%]
tests/test_nanoevents_delphes.py::test_collection_exists[EFlowTrack] PASSED [ 11%]
tests/test_nanoevents_delphes.py::test_collection_exists[Electron] PASSED [ 12%]
tests/test_nanoevents_delphes.py::test_collection_exists[Event] PASSED [ 14%]
tests/test_nanoevents_delphes.py::test_collection_exists[EventLHEF] PASSED [ 15%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet] PASSED [ 17%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet02] PASSED [ 18%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet04] PASSED [ 20%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet08] PASSED [ 21%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenJet15] PASSED [ 22%]
tests/test_nanoevents_delphes.py::test_collection_exists[GenMissingET] PASSED [ 24%]
tests/test_nanoevents_delphes.py::test_collection_exists[Jet] PASSED [ 25%]
tests/test_nanoevents_delphes.py::test_collection_exists[MissingET] PASSED [ 27%]
tests/test_nanoevents_delphes.py::test_collection_exists[Muon] PASSED [ 28%]
tests/test_nanoevents_delphes.py::test_collection_exists[Particle] PASSED [ 30%]
tests/test_nanoevents_delphes.py::test_collection_exists[ParticleFlowJet02] PASSED [ 31%]
tests/test_nanoevents_delphes.py::test_collection_exists[ParticleFlowJet04] PASSED [ 32%]
tests/test_nanoevents_delphes.py::test_collection_exists[ParticleFlowJet08] PASSED [ 34%]
tests/test_nanoevents_delphes.py::test_collection_exists[ParticleFlowJet15] PASSED [ 35%]
tests/test_nanoevents_delphes.py::test_collection_exists[Photon] PASSED [ 37%]
tests/test_nanoevents_delphes.py::test_collection_exists[ScalarHT] PASSED [ 38%]
tests/test_nanoevents_delphes.py::test_collection_exists[Tower] PASSED [ 40%]
tests/test_nanoevents_delphes.py::test_collection_exists[Track] PASSED [ 41%]
tests/test_nanoevents_delphes.py::test_collection_exists[TrackJet02] PASSED [ 42%]
tests/test_nanoevents_delphes.py::test_collection_exists[TrackJet04] PASSED [ 44%]
tests/test_nanoevents_delphes.py::test_collection_exists[TrackJet08] PASSED [ 45%]
tests/test_nanoevents_delphes.py::test_collection_exists[TrackJet15] PASSED [ 47%]
tests/test_nanoevents_delphes.py::test_collection_exists[WeightLHEF] PASSED [ 48%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[CaloJet02] PASSED [ 50%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[CaloJet04] PASSED [ 51%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[CaloJet08] PASSED [ 52%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[CaloJet15] PASSED [ 54%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet] PASSED [ 55%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet02] PASSED [ 57%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet04] PASSED [ 58%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet08] PASSED [ 60%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[GenJet15] PASSED [ 61%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[Jet] PASSED [ 62%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[ParticleFlowJet02] PASSED [ 64%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[ParticleFlowJet04] PASSED [ 65%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[ParticleFlowJet08] PASSED [ 67%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[ParticleFlowJet15] PASSED [ 68%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[TrackJet02] PASSED [ 70%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[TrackJet04] PASSED [ 71%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[TrackJet08] PASSED [ 72%]
tests/test_nanoevents_delphes.py::test_lorentz_vectorization[TrackJet15] PASSED [ 74%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[CaloJet02] PASSED [ 75%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[CaloJet04] PASSED [ 77%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[CaloJet08] PASSED [ 78%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[CaloJet15] PASSED [ 80%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet] PASSED [ 81%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet02] PASSED [ 82%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet04] PASSED [ 84%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet08] PASSED [ 85%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[GenJet15] PASSED [ 87%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[Jet] PASSED [ 88%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[ParticleFlowJet02] PASSED [ 90%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[ParticleFlowJet04] PASSED [ 91%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[ParticleFlowJet08] PASSED [ 92%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[ParticleFlowJet15] PASSED [ 94%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[TrackJet02] PASSED [ 95%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[TrackJet04] PASSED [ 97%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[TrackJet08] PASSED [ 98%]
tests/test_nanoevents_delphes.py::test_nested_lorentz_vectorization[TrackJet15] PASSED
Though I can run on CI to double check. I'll report on https://github.com/scikit-hep/uproot5/pull/1209 once that passes.
As found in #1076 with uproot 5.3.3 we have a change in the form passed to the
form_mapping
callable argument inuproot.dask
: with v5.3.2 on openingtests/samples/treemaker.root
the list of top-level fields in the record array include:and in 5.3.3 we have
it appears that the individual split sub-branches are no longer listed and only the top-level
Electrons
branch is provided. Similar issues are present for other branches. The current NanoEvents code parses the split branches to build the schema. This sounds like it may have been due to https://github.com/scikit-hep/uproot5/pull/1189