MazinLab / MKIDPipeline

The MKID Data Pipeline
http://web.physics.ucsb.edu/~bmazin/
6 stars 3 forks source link

~20% of program run time is spent in get_flag #107

Open ld-cd opened 3 weeks ago

ld-cd commented 3 weeks ago

On my desktop with our current mkidpipeline test 23% of the program runtime and the overwhelming majority of the wavecal runtime is spent in get_flag here. The situation appears to be much worse when run on wheatley.

I have a suspicion that this is accidentally quadratic in the number of pixels

https://github.com/MazinLab/MKIDPipeline/blob/2dbf0dc3199afae361775ce5304ff4e648347889/mkidpipeline/steps/wavecal.py#L2711C70-L2711C78

ld-cd commented 3 weeks ago

The majority of that time is spent in

._is_empty

ld-cd commented 2 weeks ago

The time is spent pickle deserialization that occurs here https://github.com/MazinLab/MKIDPipeline/blob/2dbf0dc3199afae361775ce5304ff4e648347889/mkidpipeline/steps/wavecal.py#L1229 when the numpy array is loaded from the backing npz file. This occurs once for every h5 file that has a wavecal applied and appears to happen in a different thread (multiprocessing not threading though), and the result of the de-serialization is not cached between wavecal applications.

Straight from SSD on my Zen4 desktop (higher single core perf than either server which is likely the relevant statistic) with a hot file system cache this step takes 4-5 seconds, I suspect this is the main bounding performance issue in the pipeline @michaelarena has been able to confirm that this also occurs on the servers.

It is likely possible to solve this with a hack but in my opinion we will be playing wackamole with these kinds of perf issues, I personally think a refactor of both the way we do serialization and the way we handle parallelism (probably by farming it off to professional engineers with an OLAP database) is in order.

(mkidpipeline-3.12) [aled@speedygoose hip109427-subset]$ ipython
/usr/lib/python3.12/site-packages/IPython/core/interactiveshell.py:937: UserWarning: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
  warn(
Python 3.12.4 (main, Jun  7 2024, 06:33:07) [GCC 14.1.1 20240522]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.26.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import astropy.constants

In [2]: import numpy as np

In [3]: blah = np.load("wavecal0_7a1032fc_24867917.wavecal.npz", allow_pickle=True)

In [4]: %time x = blah['fit_array']
CPU times: user 5.93 s, sys: 312 ms, total: 6.25 s
Wall time: 4.4 s

In [5]: %time x = blah['fit_array']
CPU times: user 5.97 s, sys: 159 ms, total: 6.13 s
Wall time: 6.14 s

In [6]: %time x = blah['fit_array']
CPU times: user 6.48 s, sys: 6.04 ms, total: 6.48 s
Wall time: 6.49 s

In [7]: %time x = blah['fit_array']
CPU times: user 5.91 s, sys: 16.5 ms, total: 5.92 s
Wall time: 5.93 s

image

ld-cd commented 2 weeks ago

@nzivkov can you let me know next time you run the pipeline on a full set of data so that I can do some profiling and determine if this is the bottleneck for real data

ld-cd commented 2 weeks ago

For reference I profiled this with the following command (python 3.12 required)

perf record -e cycles --sample-cpu  --aio -z --call-graph dwarf -F 99 -- python -X perf -m mkidpipeline.mkidpipe --make-dir --make-outputs

I've added options here for more info but you can get 99% of the way there with the defaults https://docs.python.org/3/howto/perf_profiling.html

ld-cd commented 2 weeks ago

I will also note that my desktop is faster than wheatley on this test, and going from a 16 core wavecal config to a single core wavecal config increased the runtime from 2:53 to 4:52 which is nowhere near a factor of 16, so those extra cores that wheatley and glados have are not being fully utilized

ld-cd commented 2 weeks ago

I've used hotspot to view this but firefox perf is probably more straightforward

If you would like to look at these perf results go here: https://share.firefox.dev/3YXNR3D