cta-observatory / pycorsikaio

Python reader for CORSIKA binary file format
MIT License
9 stars 2 forks source link

Add option to not parse blocks #21

Closed maxnoe closed 1 year ago

maxnoe commented 2 years ago

When loading large files in-bulk, it's much faster to accumulate the arrays and then parse the low-level float arrays then parsing each event directly.

Added an option to just keep the float array in the event loop.

codecov[bot] commented 2 years ago

Codecov Report

Base: 96.09% // Head: 95.98% // Decreases project coverage by -0.12% :warning:

Coverage data is based on head (aae021f) compared to base (e13ef5f). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #21 +/- ## ========================================== - Coverage 96.09% 95.98% -0.12% ========================================== Files 19 20 +1 Lines 384 473 +89 ========================================== + Hits 369 454 +85 - Misses 15 19 +4 ``` | [Impacted Files](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory) | Coverage Δ | | |---|---|---| | [corsikaio/file.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-Y29yc2lrYWlvL2ZpbGUucHk=) | `97.54% <100.00%> (+0.19%)` | :arrow_up: | | [tests/test\_file.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-dGVzdHMvdGVzdF9maWxlLnB5) | `100.00% <100.00%> (ø)` | | | [corsikaio/subblocks/run\_header.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-Y29yc2lrYWlvL3N1YmJsb2Nrcy9ydW5faGVhZGVyLnB5) | `84.00% <0.00%> (-3.50%)` | :arrow_down: | | [corsikaio/subblocks/data.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-Y29yc2lrYWlvL3N1YmJsb2Nrcy9kYXRhLnB5) | `100.00% <0.00%> (ø)` | | | [corsikaio/subblocks/dtypes.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-Y29yc2lrYWlvL3N1YmJsb2Nrcy9kdHlwZXMucHk=) | `100.00% <0.00%> (ø)` | | | [corsikaio/subblocks/run\_end.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-Y29yc2lrYWlvL3N1YmJsb2Nrcy9ydW5fZW5kLnB5) | `100.00% <0.00%> (ø)` | | | [corsikaio/subblocks/\_\_init\_\_.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-Y29yc2lrYWlvL3N1YmJsb2Nrcy9fX2luaXRfXy5weQ==) | `100.00% <0.00%> (ø)` | | | [corsikaio/subblocks/event\_end.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-Y29yc2lrYWlvL3N1YmJsb2Nrcy9ldmVudF9lbmQucHk=) | `100.00% <0.00%> (ø)` | | | [corsikaio/subblocks/longitudinal.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-Y29yc2lrYWlvL3N1YmJsb2Nrcy9sb25naXR1ZGluYWwucHk=) | `100.00% <0.00%> (ø)` | | | [tests/test\_units.py](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory#diff-dGVzdHMvdGVzdF91bml0cy5weQ==) | `100.00% <0.00%> (ø)` | | | ... and [1 more](https://codecov.io/gh/cta-observatory/pycorsikaio/pull/21?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory) | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=cta-observatory)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

The-Ludwig commented 1 year ago

Hi @maxnoe! I played around with this in the last week and successfully use this in https://github.com/The-Ludwig/PANAMA . I parse CORSIKA DAT files to pandas dataframes and using noparse, I am way faster.

I think this is fine to merge, actually, should I add a test?

Only comment I have is: also checking noparse in the derived classes from CorsikaFile seems unneeded, since

maxnoe commented 1 year ago

@orelgueta Could you give a quick review here, it's a nice-to-have feature for @The-Ludwig and shouldn't interfere with our usage

The-Ludwig commented 1 year ago

@orelgueta Yes, that is correct. In my testing I am around 5 times faster if I don't parse the particle blocks, put them into a python list, make a pandas dataframe out of them and then name the columns. Of course it depends on the size and structure of the file itself, but there are definitely some good use-cases.

maxnoe commented 1 year ago

Not using their own code, the functions here can be used. The difference is basically that for the use case of reading all events in a file into a single data structure, instead of parsing n arrays and then stacking you stack n simple arrays first and then parse once.