desihub / desispec

DESI spectral pipeline
BSD 3-Clause "New" or "Revised" License
36 stars 24 forks source link

improve import time by postponing optional imports #2379

Open sbailey opened 1 month ago

sbailey commented 1 month ago

@dmargala identified that our occasional PMI init failures are related to large variations in the startup time of different MPI ranks before they connect to MPI. These were traced to variations in module-level import times, which further identified several surprisingly expensive steps. Some of these could be deferred into importing optional libraries only if needed by specific functions.

To explore further:

desispec.io.spectra -> desispec.spectra -> specutils dominates the import time, but is only needed for Spectra.to_specutils and Spectra.from_specutils. Part of the specutils expense is importing astropy.nddata which imports dask stuff, so the astropy.nddata imports would also have to be postponed.

desispec.pixgroup -> import healpy is expensive and appears to be unneeded.

desispec.tsnr -> astropy.convolution is expensive again because it touches astropy.nddata -> dask. This import could be deferred and/or replaced with scipy.signal.fftconvolve.

etc.

python -Ximporttime -c "import desispec.spectra" gives a dizzying amount of info. @dmargala has visualization wrappers on that to simplify the output. Sharing that and/or a hackfest with him to refactor some of these imports could be useful.

moustakas commented 1 month ago

For the record, I've noted the desispec import overhead when benchmarking fastspecfit (which has desispec as one of its core dependencies) as well, so I appreciate the type of cleanup associated with this ticket.

The other place where a large number of (potentially unnecessary) imports occur is in, e.g., https://github.com/desihub/desispec/blob/main/py/desispec/io/__init__.py, and potentially other __init__.py files.

E.g., if all I want is desispec.io.spectra.read_spectra, a bunch of stuff related to data reduction are imported along for the ride.

weaverba137 commented 1 month ago

I would not object to deferring the import of specutils unless and until the mentioned methods are used.

sbailey commented 1 month ago

Use with caution, but...

While chasing a different issue, @segasai noticed https://peps.python.org/pep-0562/ that we can define module-level __getattr__ to implement lazy (just-in-time) loading of submodules. This might be useful for desispec.io where for convenience we import submodule functions like read_frame, read_spectra, read_image, read_raw into the top-level desispec.io namespace, but that currently has the side effect of loading all of them and their sub-dependencies even if you only need one of them. If implemented, include lots of comments about what is really going on, since @segasai discovered this while we were scratching our heads about odd side effects from scipy doing this.

sbailey commented 1 month ago

A wrapper script for easier human parsing from @dmargala : https://gist.github.com/dmargala/fcb1e9adf14f662155e138fe5f266226#file-importtrace-py

e.g.

[login29 importtime] python importtrace.py desispec.io --min-elapsed 500
['/global/common/software/desi/perlmutter/desiconda/20240425-2.2.0/conda/bin/python', '-Ximporttime', '-c', 'import desispec.io']
importtime: ['desispec.io'] (total=3817ms)
                                         | start(ms) self(ms) total(ms) depth name
---------------------------------------- | ----------------------------------------
▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓     |        35        0      3464     0 desispec.io
░░░░░░░░░░░                              |        44        0      1056     1   desispec.io.fiberflat
           ░░░░░░░░░░░░░░░░░░░░          |      1100        0      1871     1   desispec.io.spectra
           ░░░░░░░░░░░░░░░░░░░░          |      1100        0      1870     2     desispec.spectra
           ░░░░░░░░░░░░░░░░░░░░          |      1100      130      1869     3       specutils
           ░░░░░░░░░░░░░░░░░             |      1101        0      1636     4         specutils.spectra
           ░░░░░░░░░░░░░░░░░             |      1101        0      1635     5           specutils.spectra.spectrum1d
           ░░░░░░░                       |      1101        0       656     6             astropy.nddata
           ░░░░░░░                       |      1101        0       648     7               astropy.nddata.blocks
           ░░░░░░░                       |      1101        0       648     8                 astropy.nddata.decorators
           ░░░░░░░                       |      1101        0       648     9                   astropy.nddata.nddata
            ░░░░░░                       |      1237        0       510    10                     dask.array
                      ░░░░░░             |      2131        1       602     6             ndcube
                      ░░░░░░             |      2139        0       591     7               ndcube.ndcube