cta-observatory / pyeventio

Python read-only implementation for the EventIO data format used by the CORSIKA 7 IACT extension and sim_telarray
MIT License
10 stars 12 forks source link

Evaluate using awkwardarray for variable length arrays #223

Open maxnoe opened 3 years ago

maxnoe commented 3 years ago

We should evaluate if we can use awkward to speed things up where we currently use list of lists or arrays of arrays where we have variable length data.

https://awkward-array.org/quickstart.html

kosack commented 3 years ago

I tried some experiments using it in ctapipe a year ago or so, but decided in the end it would require too much redesign. Although it depends where and how heavily we would use it. It has some implications (and benefits) also for data format (e.g. ability to write these var-length arrays to parquet).

It's interesting technology though, and could help solve the problem of event-wise vs bunch-of-event processing. Ideally we would like to support the latter and do away with all explicit loops over events for efficiency purposes, but the current design makes that difficult. My main issue with awkward was just that it was not very stable, but now that there is a 1.x release, that is encouraging.

Places where it could be interesting to explore using it would be:

maxnoe commented 3 years ago

Here I was mainly talking about the places where simtel array uses variable length data and that have a quite large performance impact when reading eventio.

These are at the moment:

kosack commented 3 years ago

Perhaps a similar issue should be opened for pyeventio, since there it's clearly useful.

maxnoe commented 3 years ago

Sorry, I misclicked. I intended this to be an eventio issue.