Open ld-cd opened 3 weeks ago
The majority of that time is spent in
The time is spent pickle deserialization that occurs here https://github.com/MazinLab/MKIDPipeline/blob/2dbf0dc3199afae361775ce5304ff4e648347889/mkidpipeline/steps/wavecal.py#L1229 when the numpy array is loaded from the backing npz file. This occurs once for every h5 file that has a wavecal applied and appears to happen in a different thread (multiprocessing not threading though), and the result of the de-serialization is not cached between wavecal applications.
Straight from SSD on my Zen4 desktop (higher single core perf than either server which is likely the relevant statistic) with a hot file system cache this step takes 4-5 seconds, I suspect this is the main bounding performance issue in the pipeline @michaelarena has been able to confirm that this also occurs on the servers.
It is likely possible to solve this with a hack but in my opinion we will be playing wackamole with these kinds of perf issues, I personally think a refactor of both the way we do serialization and the way we handle parallelism (probably by farming it off to professional engineers with an OLAP database) is in order.
(mkidpipeline-3.12) [aled@speedygoose hip109427-subset]$ ipython
/usr/lib/python3.12/site-packages/IPython/core/interactiveshell.py:937: UserWarning: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
warn(
Python 3.12.4 (main, Jun 7 2024, 06:33:07) [GCC 14.1.1 20240522]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import astropy.constants
In [2]: import numpy as np
In [3]: blah = np.load("wavecal0_7a1032fc_24867917.wavecal.npz", allow_pickle=True)
In [4]: %time x = blah['fit_array']
CPU times: user 5.93 s, sys: 312 ms, total: 6.25 s
Wall time: 4.4 s
In [5]: %time x = blah['fit_array']
CPU times: user 5.97 s, sys: 159 ms, total: 6.13 s
Wall time: 6.14 s
In [6]: %time x = blah['fit_array']
CPU times: user 6.48 s, sys: 6.04 ms, total: 6.48 s
Wall time: 6.49 s
In [7]: %time x = blah['fit_array']
CPU times: user 5.91 s, sys: 16.5 ms, total: 5.92 s
Wall time: 5.93 s
@nzivkov can you let me know next time you run the pipeline on a full set of data so that I can do some profiling and determine if this is the bottleneck for real data
For reference I profiled this with the following command (python 3.12 required)
perf record -e cycles --sample-cpu --aio -z --call-graph dwarf -F 99 -- python -X perf -m mkidpipeline.mkidpipe --make-dir --make-outputs
I've added options here for more info but you can get 99% of the way there with the defaults https://docs.python.org/3/howto/perf_profiling.html
I will also note that my desktop is faster than wheatley
on this test, and going from a 16 core wavecal config to a single core wavecal config increased the runtime from 2:53 to 4:52 which is nowhere near a factor of 16, so those extra cores that wheatley
and glados
have are not being fully utilized
I've used hotspot to view this but firefox perf is probably more straightforward
If you would like to look at these perf results go here: https://share.firefox.dev/3YXNR3D
On my desktop with our current mkidpipeline test 23% of the program runtime and the overwhelming majority of the wavecal runtime is spent in
get_flag
here. The situation appears to be much worse when run on wheatley.I have a suspicion that this is accidentally quadratic in the number of pixels
https://github.com/MazinLab/MKIDPipeline/blob/2dbf0dc3199afae361775ce5304ff4e648347889/mkidpipeline/steps/wavecal.py#L2711C70-L2711C78