Closed takluyver closed 5 years ago
The import profiling added in Python 3.7 is very useful for seeing what is causing the slowdown:
python3 -X importtime -c "from karabo_data import RunDirectory"
The biggest remaining things are h5py and numpy. It would be more awkward to delay those, because they're more widely used.
The last two commits deprecate the top-level import of ZMQ streaming functionality, so:
from karabo_data import ZMQStreamer
# Should become:
from karabo_data.export import ZMQStreamer
This avoids importing zmq & msgpack if they're not used. This is a much more marginal gain than avoiding pandas and xarray - maybe 20 ms instead of hundreds of ms. But I don't think most uses of karabo_data
use this functionality, and even 20 ms can make a difference if you're trying to make a command-line tool feel as instant as possible.
I think at least @ebadkamil is using this (The mention is just so your aware of the change).
LGTM
Thanks! Yup, I'm planning to make a PR to karaboFAI to change the ZMQStreamer import.
This speeds up import by 50-70% in my tests (from ~1s to 0.3-0.5s). Pandas and xarray in particular both import many files, so there's a big benefit if we can avoid them.
Of course, you still have to import those packages if you want to use their data formats. But we don't for
lsxfel
, so this significantly improves overall performance (in combination with #206).