Closed samuelcolvin closed 2 years ago
We have looked at this before but unfortunately there may not be much we can do. We import NumPy always, and Pandas if it is available. On my machine those alone account for the large majority of time:
In [1]: %timeit !python -c 'from bokeh.plotting import figure'
533 ms ± 5.29 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [2]: %timeit !python -c 'import pandas'
295 ms ± 2.78 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit !python -c 'import numpy'
140 ms ± 3.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
There is nothing we can do about that overhead on our end.
That said Python 3.7 has a new import profiling mode. Quickly, without yet looking at them in any detail, here are the results that might point to some tweaks that can be made:
EDIT: add spoiler tag to details
Ah, you already posted that, I was not familiar with the GH spoiler tag option.
Is there any chance you could make pandas imports lazy and only do the import when it's actually required?
not without technically making a breaking change. Module global DATETIME_TYPES
is conditionally populated based on whether or not pandas is present:
@samuelcolvin here is what I would say could help move this along. If you want to experiment with making all the imports conditional and see if that can improve things, that would be a great first step. You'd look to move all of these:
pd = import_optional('pandas')
of which there are maybe 10-12 place in the codebase.
For the one mentioned above, it's possible the import of bokeh.util.serialization
can be made lazy (avoiding any breaking change altogether) or the DATETIME_TYPES could become a function as an experiment for now to see if the lazy pd
works. Can you report back findings here?
https://github.com/samuelcolvin/notbook is the reason I was asking this, would love your feedback @bryevdv.
I'm pretty busy right now but I should think bokeh is great, I'll try and work on it when I get a chance.
Now that module getattr is an option its possible DATETIME_TYPES could even be handled without any breaking changed. Triaging to 3.0 to at least experiment.
With #11891 initial import times on branch-3.0
are down to ~350ms (from >1 s before) which is probably about the best we can do as long as numpy is an unconditional import (and I don't see a way to avoid that)
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
ALL software version info (bokeh, python, notebook, OS, browser, any other relevant packages)
2.0.2
3.8.2
Description of expected behavior and the observed behavior
Importing bokeh is very slow, around 1s for me. This is enough to significantly damage the developer experience.
Complete, minimal, self-contained example code that reproduces the issue
python -c 'print("hello")' > 0.02s user 0.01s system 91% cpu 0.035 total
python -c 'from bokeh.plotting import figure' 0.98s user 0.21s system 115% cpu 1.038 total
python simple_plot.py 1.08s user 0.30s system 126% cpu 1.084 total
where
simple_plot.py
is:This is enough to be visually slow to the user.
output of
python -X importtime -c 'from bokeh.plotting import figure'
: