Closed maxnoe closed 1 year ago
Base: 96.09% // Head: 95.98% // Decreases project coverage by -0.12%
:warning:
Coverage data is based on head (
aae021f
) compared to base (e13ef5f
). Patch coverage: 100.00% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Hi @maxnoe! I played around with this in the last week and successfully use this in https://github.com/The-Ludwig/PANAMA . I parse CORSIKA DAT files to pandas dataframes and using noparse, I am way faster.
I think this is fine to merge, actually, should I add a test?
Only comment I have is: also checking noparse
in the derived classes from CorsikaFile
seems unneeded, since
@orelgueta Could you give a quick review here, it's a nice-to-have feature for @The-Ludwig and shouldn't interfere with our usage
@orelgueta Yes, that is correct. In my testing I am around 5 times faster if I don't parse the particle blocks, put them into a python list, make a pandas dataframe out of them and then name the columns. Of course it depends on the size and structure of the file itself, but there are definitely some good use-cases.
Not using their own code, the functions here can be used. The difference is basically that for the use case of reading all events in a file into a single data structure, instead of parsing n arrays and then stacking you stack n simple arrays first and then parse once.
When loading large files in-bulk, it's much faster to accumulate the arrays and then parse the low-level float arrays then parsing each event directly.
Added an option to just keep the float array in the event loop.