edbennett / glue_analysis

MIT License
0 stars 1 forks source link

New bottleneck: Sorting in `get_numpy` #46

Open chillenzer opened 11 months ago

chillenzer commented 11 months ago

In my current test run, get_numpy seems to be the bottleneck. Now, the reading is no longer problematic (neither time-wise nor memory-wise) but a call to get_pyerrors waits an awfully long time for the internal get_numpy call (verified by scalene profile) and gets prohibitively expensive memory-wise for more than a few thousand configurations. Concerning the memory I haven't managed to convince scalene to give me a clear answer, yet, but given that the overwhelming majority of the time is spent in get_numpy and that the growth seems to be roughly linear, there is still a clear candidate.

More specifically, it is the sorting in the get_numpy function. Why are we sorting there? I mean, it was kind of convenient when I wrote the reader but why aren't the index columns just in the correct order? We could guarantee that right from the start and get rid of the sorting there.