CoffeaTeam / coffea

Basic tools and wrappers for enabling not-too-alien syntax when running columnar Collider HEP analysis.
https://coffeateam.github.io/coffea/
BSD 3-Clause "New" or "Revised" License
132 stars 126 forks source link

Tracking issue for 0.7.0 #396

Closed lgray closed 11 months ago

lgray commented 3 years ago

@aperloff @kondratyevd @jrueb - if you want to help with the accuracy tests for JME I'd really appreciate it.

Most of these are handled in #386, and this is likely incomplete. @nsmith- @areinsvo @mcremone please feel free to add items here, and we can discuss further in comments.

Possible additions:

lgray commented 3 years ago

@nsmith- can you check there's no documentation referencing ak0 specifically? Then we can check off the last item.

ivotron commented 3 years ago

hi @nsmith- and @lgray , quick question/follow-up regarding arrow/parquet. Will it be possible to read parquet (nanoevent) files through arrow's dataset API? thanks!

lgray commented 3 years ago

If you can make an example of the data that's generated we can check? If the dataset api spits out arrow tables you're likely very close to done.

ivotron commented 3 years ago

we actually don't have data in parquet already. Is it possible to convert from root to parquet and then read those?

lgray commented 3 years ago

Yep, you should be able to sanely transform root into parquet with uproot + awkward now. The issues with correctly preserving nullability are fixed (so far as we know).

NanoAOD shouldn't be a huge deal.

nsmith- commented 3 years ago

Looked around for awkward 0 stuff, and only found an old link and updated it in https://github.com/CoffeaTeam/coffea/pull/403/commits/5afc91815b256366f8e988ef39757347f54b5359

lgray commented 3 years ago

awkward0 doesn't convert things in a sane way for our purposes (the nullability and some other stuff is all wrong).

Probably better to cook an awkward1 converter and stick it in util.

matthewfeickert commented 2 years ago

:wave: Hi. It seems that coffea is currently the only thing that is keeping uproot3, uproot3_methods, and awkward0 in LCG dev4[^1] (c.f. https://sft.its.cern.ch/jira/browse/SPI-2229?focusedCommentId=115655&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-115655). Is uproot3 something that is still deeply depended on by coffea v0.7.19+ (I see that there was work to minimize this in PR #386) or is this something that with some contributions from IRIS-HEP people could be removed and moved towards uproot v4?

[^1]: Let it be noted that I don't generally suggest that people go for LCG views if they don't have to, but I'm trying to update a bunch of Scikit-HEP/IRIS-HEP packages in LCG dev4 in general.

lgray commented 2 years ago

@matthewfeickert uproot3 is not that deeply depended on but provided some expected behaviors for certain folks. It will go away in coffea 0.8.0 (~start of 2023).

nsmith- commented 2 years ago

It is only for dumping coffea hist into ROOT files: https://github.com/CoffeaTeam/coffea/blob/ec7601b0cc09b9f2bb9217f7cec8d6d6d15a2256/coffea/hist/export.py#L38-L43 Unfortunately, the current API assumes the user is creating an output file in uproot3 so it cannot just be silently upgraded to uproot4. Perhaps we can put the needed import and class definitions inside the function to at least remove the dependency on install.

lgray commented 2 years ago

All of coffea.hist is being removed, so we just stuck that in an uproot 4/5 style implementation in util.