edbennett / glue_analysis

MIT License
0 stars 1 forks source link

Performance read binary #42

Closed chillenzer closed 11 months ago

chillenzer commented 11 months ago

Description

Moving indexing columns into pd.MultiIndex for performance reasons which entails minor adaptions in the functional code, moderate adaptions in the validation code and major adaptions in the test data/cases. Also using an integer enumeration of operator index and blocking level index. Could be nice to provide a function that can translate, e.g., for finding a particular operator-blocking combination. But that's likely not necessary for the use case we currently have.

Profiles

I've profiled the code

from glue_analysis.readers.read_binary import read_correlators_binary

corr_filename = "tmp.data/corr0RPpR.dat.2"  # 394MB of data
vev_filename = "tmp.data/avac0.dat.2"  # 168KB of data
correlators = read_correlators_binary(corr_filename, vev_filename)

Original profile was: profile.for-PR-orig_2023-12-04_18:10.json (view via scalene --viewer and browse to the downloaded .json file)

It's around 5min and most of the time is spent in translating the 2x 2 columns of Internal/Blocking_index into an index of tuples. That function also uses significant memory, together with the actual creation of the indexing columns. Scalene reports a peak of more than 16GB which is a factor of 40 compared to the input data.

New profile is: profile.for-PR_2023-12-04_18:04.json

Total execution time is down to 43s. Resource consumption of _read_correlators_binary (without validation) is negligible by now, 2s and 800MB which is down to a factor of 2 -- likely the price we have to pay for using Python.

Memory bottlenecks are the validation steps, particularly those concerned with ensuring consistency of the internal indexing. That was not the focus of optimisations by now. Likely there are copy operations hidden in the expressions. Time-wise the cross-validation between VEV and correlator indexing seems to be the problem. It is admittedly a bit paranoid (or more positively: assumes a very general use case) to run the check for each individual timeslice and operator.

chillenzer commented 11 months ago

Added switch perform_expensive_validation to freeze. New profile is almost trivial: profile.without_validation_2023-12-07_09:00.json

Last bottleneck-ish thing is that the detour via file.read() and np.frombuffer instead of np.fromfile doubles the memory because it first reads everything into a Python (byte) string and copies that into an array afterwards. A factor of 2 sounds like a lot but given that our current use case will involve a lot of small batches instead of one large one that should not hinder its use in production.

For comparison a profile with np.fromfile: profile.without_validation_fromfile_2023-12-07_09:05.json

I'll open a separate issue #44 to track that last bottleneck and put some ideas to fix it there, too.

chillenzer commented 11 months ago

PS: Considering this ready for another review/approval.

chillenzer commented 11 months ago

Ready to review again from my side. We're at 22 commits already. Will you get me to 25? And do we count the commit-revert?

chillenzer commented 11 months ago

Okay. We're at 23 and all the bikes fit neatly. Anything else?

edbennett commented 11 months ago

Pretty sure you committed two unicycles masquerading as a single bike in that last commit, but let's call it a day there

chillenzer commented 11 months ago

That's only because our shed's design didn't accommodate for the need to park unicycles there. We should totally re-open and consider that possibility!