Open jbrewster7 opened 2 months ago
Hello, can you post some of the code that causes this behavior? If you can isolate all this in a simple reproducer it'll help us identify the cause more quickly.
I saw a similar behavior while doing the coffea-casa scale tests a few weeks ago. Very small chunksizes (initially a bug where i accidentally passed a O(100) number in as step size instead of steps_per_file), presumably small fractions of TBasket sizes, seem to lead to a serious struggle. Didn't follow up on that yet (and can't for the next couple weeks probably), but intended to scan over it for v1.1 of my simple-benchmark code
Hello, I am using the
coffea.nanoevents.NanoEventsFactory.from_root
function fromcoffea.2024.5.0
and I am specifying chunking as defined in https://github.com/scikit-hep/uproot5/blob/v5.1.2/src/uproot/_dask.py#L109-L132 (as suggested in coffea). I am running this on lxplus with files in theeos
folder usingxrootd
. I am running into something that I find odd, though may just be behaving differently than I expect.Initially, I arbitrarily chose to have chunks of 10000 events (which is equivalent to about 16MB in the root file). This worked until I was working with a larger number of files. With more total files my RAM would fill up and my script would crash when computing using
dask.compute()
. When I used smaller chunks, my RAM would fill up and it would crash faster (the smaller I made the chunks the faster it would crash). I ended up having to increase my chunk size by 10 for it not to crash.Could this be happening because when working with this small of chunks the amount of file i/o required overwhelms the RAM? Or is this possibly a bug in either coffea or uproot?
Thanks for your help!