dask-contrib / dask-awkward

Native Dask collection for awkward arrays, and the library to use it.
https://dask-awkward.readthedocs.io
BSD 3-Clause "New" or "Revised" License
61 stars 19 forks source link

`dak.num` on `axis=0` leads to `TypeError: '>' not supported between instances of 'tuple' and 'int'` #394

Closed ikrommyd closed 1 year ago

ikrommyd commented 1 year ago

@JaLuka98 posted this issue at the CMS experiment mattermost.

import awkward as ak
from coffea.nanoevents import NanoAODSchema, NanoEventsFactory

events = NanoEventsFactory.from_root(
    {
        "https://github.com/CoffeaTeam/coffea/raw/master/tests/samples/nano_dy.root": "Events"
    },
    schemaclass=NanoAODSchema,
    permit_dask=True,
).events()
ak.num(events.Electron, axis=0).compute()

This snippet fails with

Traceback (most recent call last):
  File "/Users/iason/fun/egamma_dev/egamma-tnp/bug.py", line 9, in <module>
    ak.num(events.Electron, axis=0).compute()
  File "/Users/iason/fun/egamma_dev/.venv/lib/python3.10/site-packages/dask/base.py", line 342, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/Users/iason/fun/egamma_dev/.venv/lib/python3.10/site-packages/dask/base.py", line 621, in compute
    dsk = collections_to_dsk(collections, optimize_graph, **kwargs)
  File "/Users/iason/fun/egamma_dev/.venv/lib/python3.10/site-packages/dask/base.py", line 394, in collections_to_dsk
    dsk = opt(dsk, keys, **kwargs)
  File "/Users/iason/fun/egamma_dev/.venv/lib/python3.10/site-packages/dask_awkward/lib/optimize.py", line 53, in all_optimizations
    dsk = optimize(dsk, keys=keys)
  File "/Users/iason/fun/egamma_dev/.venv/lib/python3.10/site-packages/dask_awkward/lib/optimize.py", line 80, in optimize
    dsk = optimize_columns(dsk)
  File "/Users/iason/fun/egamma_dev/.venv/lib/python3.10/site-packages/dask_awkward/lib/optimize.py", line 198, in optimize_columns
    projection_data = _prepare_buffer_projection(dsk)
  File "/Users/iason/fun/egamma_dev/.venv/lib/python3.10/site-packages/dask_awkward/lib/optimize.py", line 113, in _prepare_buffer_projection
    projection_layers[name] = lay.mock()
  File "/Users/iason/fun/egamma_dev/.venv/lib/python3.10/site-packages/dask_awkward/layers/layers.py", line 282, in mock
    if len(task) == 2 and task[1] > 0:
TypeError: '>' not supported between instances of 'tuple' and 'int'

It only fails with nanoevents and I couldn't reproduce it with just uproot.dask reading.

agoose77 commented 1 year ago

I can't reproduce this locally. Did you say that you can reproduce it @iasonkrom?

ikrommyd commented 1 year ago

Yup, I only said I can't reproduce with with just uproot.dask without using nanoevents.

image
ikrommyd commented 1 year ago

@agoose77 these are the package versions

(.venv) iason@coffeabox:~/fun/egamma_dev/egamma-tnp$ pip list | grep 'awkward\|dask\|uproot'
awkward                   2.4.6
awkward-cpp               24
dask                      2023.10.0
dask-awkward              2023.10.0
dask-histogram            2023.10.0
uproot                    5.1.2
agoose77 commented 1 year ago

This is fixed by 801aec7c3b494764df1a23d776a0c5e60c08754c

We haven't had a release yet, though. @douglasdavis could you make a patch? :)

ikrommyd commented 1 year ago

Ah, I’m sorry then. Should have checked for that. Thanks. Please close the issue once a patch is out.

douglasdavis commented 1 year ago

Yes! 2023.10.1 is on PyPI. Glad to see type checking led to a bug fix, sorry for not putting out a release more quickly, the bug hadn't surfaced in the wild yet.

JaLuka98 commented 1 year ago

Thanks to everyone also from my side for addressing this so quickly, it helps me a lot!

agoose77 commented 1 year ago

Ah, I’m sorry then. Should have checked for that. Thanks. Please close the issue once a patch is out.

No apology needed — it took me a second to find the ref that fixed it.