Closed chillenzer closed 11 months ago
Added switch perform_expensive_validation
to freeze
. New profile is almost trivial:
profile.without_validation_2023-12-07_09:00.json
Last bottleneck-ish thing is that the detour via file.read()
and np.frombuffer
instead of np.fromfile
doubles the memory because it first reads everything into a Python (byte) string and copies that into an array afterwards. A factor of 2 sounds like a lot but given that our current use case will involve a lot of small batches instead of one large one that should not hinder its use in production.
For comparison a profile with np.fromfile
:
profile.without_validation_fromfile_2023-12-07_09:05.json
I'll open a separate issue #44 to track that last bottleneck and put some ideas to fix it there, too.
PS: Considering this ready for another review/approval.
Ready to review again from my side. We're at 22 commits already. Will you get me to 25? And do we count the commit-revert?
Okay. We're at 23 and all the bikes fit neatly. Anything else?
Pretty sure you committed two unicycles masquerading as a single bike in that last commit, but let's call it a day there
That's only because our shed's design didn't accommodate for the need to park unicycles there. We should totally re-open and consider that possibility!
Description
Moving indexing columns into
pd.MultiIndex
for performance reasons which entails minor adaptions in the functional code, moderate adaptions in the validation code and major adaptions in the test data/cases. Also using an integer enumeration of operator index and blocking level index. Could be nice to provide a function that can translate, e.g., for finding a particular operator-blocking combination. But that's likely not necessary for the use case we currently have.Profiles
I've profiled the code
Original profile was: profile.for-PR-orig_2023-12-04_18:10.json (view via
scalene --viewer
and browse to the downloaded.json
file)It's around 5min and most of the time is spent in translating the 2x 2 columns of
Internal/Blocking_index
into an index of tuples. That function also uses significant memory, together with the actual creation of the indexing columns. Scalene reports a peak of more than 16GB which is a factor of 40 compared to the input data.New profile is: profile.for-PR_2023-12-04_18:04.json
Total execution time is down to 43s. Resource consumption of
_read_correlators_binary
(without validation) is negligible by now, 2s and 800MB which is down to a factor of 2 -- likely the price we have to pay for using Python.Memory bottlenecks are the validation steps, particularly those concerned with ensuring consistency of the internal indexing. That was not the focus of optimisations by now. Likely there are copy operations hidden in the expressions. Time-wise the cross-validation between VEV and correlator indexing seems to be the problem. It is admittedly a bit paranoid (or more positively: assumes a very general use case) to run the check for each individual timeslice and operator.