European-XFEL / karabo_data

Python tools to read and analyse data from European XFEL
https://karabo-data.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
13 stars 7 forks source link

Delay some imports to where they are used #207

Closed takluyver closed 5 years ago

takluyver commented 5 years ago

This speeds up import by 50-70% in my tests (from ~1s to 0.3-0.5s). Pandas and xarray in particular both import many files, so there's a big benefit if we can avoid them.

Of course, you still have to import those packages if you want to use their data formats. But we don't for lsxfel, so this significantly improves overall performance (in combination with #206).

takluyver commented 5 years ago

The import profiling added in Python 3.7 is very useful for seeing what is causing the slowdown:

python3 -X importtime -c "from karabo_data import RunDirectory"

The biggest remaining things are h5py and numpy. It would be more awkward to delay those, because they're more widely used.

takluyver commented 5 years ago

The last two commits deprecate the top-level import of ZMQ streaming functionality, so:

from karabo_data import ZMQStreamer
# Should become:
from karabo_data.export import ZMQStreamer

This avoids importing zmq & msgpack if they're not used. This is a much more marginal gain than avoiding pandas and xarray - maybe 20 ms instead of hundreds of ms. But I don't think most uses of karabo_data use this functionality, and even 20 ms can make a difference if you're trying to make a command-line tool feel as instant as possible.

tmichela commented 5 years ago

I think at least @ebadkamil is using this (The mention is just so your aware of the change).

LGTM

takluyver commented 5 years ago

Thanks! Yup, I'm planning to make a PR to karaboFAI to change the ZMQStreamer import.