Ouranosinc / xclim

Library of derived climate variables, ie climate indicators, based on xarray.
https://xclim.readthedocs.io/en/stable/
Apache License 2.0
333 stars 59 forks source link

Speed up import #1948

Open huard opened 1 month ago

huard commented 1 month ago

Addressing a Problem?

Import takes 2.5s on my laptop.

Benchmark using python -X importtime test.py where test.py is just import xclim

Potential Solution

For reference, here are import times for some of our dependencies. Note that these numbers are only valid in the xclim context, you'd get different results by testing them individually, since they import each other.

Additional context

Code for lazy import (https://docs.python.org/3/library/importlib.html#implementing-lazy-imports)

import importlib.util
import sys
def lazy_import(name):
    spec = importlib.util.find_spec(name)
    loader = importlib.util.LazyLoader(spec.loader)
    spec.loader = loader
    module = importlib.util.module_from_spec(spec)
    sys.modules[name] = module
    loader.exec_module(module)
    return module

Note that if we lazy import indicators, then they're not in the xclim registry. So the virtual module creation, which relies on the registry, would need to trigger their import.

Contribution

Code of Conduct

aulemahal commented 1 month ago

I ran the same tests and piped it through tuna, like I did in #1135 and here's a snapshot:

image

I fear that most time is not lost by loading indicators. xclim.indices shows up at the top only because of the order of operations. The longest-loading submodule seems to be in the fire indicators, and that might be numba jitting functions eagerly rather than lazily. Some gain could be made there.

huard commented 1 month ago

Regarding the load time of indices, what I did is I commented from indices import * in the __init__ and commented another side import of indices elsewhere in indicators.py. I computed the difference between the import time in this scenario and the base scenario.

SarahG-579462 commented 1 month ago

This would certainly help in #1955 , since the main slow-down for command-line tools is the import time for xclim (followed by the start-up time for python).

Would it be possible to have the register for indices (needed for the CLI) be created during pip install xclim or conda install xclim ? The numba functions could also be compiled here, if needed, couldn't they? we don't use ufuncs for them, which is one of the limitations of Ahead-of-time compilation

aulemahal commented 1 month ago

We could export a json of the indicators and parameters, on install. That would break the idea that "virtual submodule" are loaded live, but maybe that's ok for the CLI. However, as long as we don't change how the indicators modules are structured, I don't think that would improve anything else than xclim info, no? Once you found the indicator you want to run, "loading" it will result in importing the rest of the package anyway.

Another issue of ahead-of-time compilation is that we need to specify all possible signatures in advance, no? Not impossible, but seems sub-optimal.

SarahG-579462 commented 1 month ago

I think the command is xclim indices?

Indeed, and with the deprecation for numba.pycc coming, this doesn't seem like the best approach.