civisanalytics / civis-python

Civis API Python Client
BSD 3-Clause "New" or "Revised" License
34 stars 26 forks source link

[CIVIS-8658] PERF speed up startup time #490

Closed jacksonlee-civis closed 5 months ago

jacksonlee-civis commented 5 months ago

The startup time when just running import civis in a fresh Python interpreter session appears to be long. This is especially noticeable in an interactive environment (the regular Python interactive interpreter, IPython, or a Jupyter notebook), which isn't great for user experience. This pull request applies lazy imports to the top-level civis.* modules, for a speed boost of about 5x. The relatively long startup time was due to importing dependencies like pandas through civis.io and other pydata packages (numpy, scipy) through civis.ml.

We can use python -X importtime -c 'import civis' to get the import time details. Before this pull request (for brevity, just showing the final line for the top-level civis import):

import time: self [us] | cumulative | imported package
...
import time:     14290 |    3144744 | civis

After this PR:

import time: self [us] | cumulative | imported package
...
import time:      5978 |     606370 | civis

It's potentially possible to speed things up further by coating a few more things along the import paths with lazy imports. I tried that with some success, but I started to have to make other changes as well to have the test suite pass (e.g., ModuleNotFoundError for existing working imports, where tweaking the import syntax here and there might resolve the issue). That suggested to me that it's not worth the effort. Keeping the lazy import implementation simple for a 5x speed-up is good enough.