Speedup package import time

danielhollas commented 7 months ago

We've been looking at the import time of the mlptrain package, which takes over a second on the cluster, and around 620ms on my dev machine with NVMe drive. Importing autode by itself takes 465 ms on main branch.

One of the easy wins is to import matplotlib only when needed, which saves around 160 ms.

Other potential improvements would come from delayed import of scipy and / or RDkit. But those would require more changes --- happy to open a separate PR if that is desired.

Corresponding PR on mlptrain repo: https://github.com/duartegroup/mlp-train/pull/84

codecov[bot] commented 7 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (05b2d98) 97.44% compared to head (da39a40) 97.44%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## v1.4.2 #319 +/- ## ======================================= Coverage 97.44% 97.44% ======================================= Files 208 209 +1 Lines 23728 23746 +18 ======================================= + Hits 23122 23140 +18 Misses 606 606 ``` | [Flag](https://app.codecov.io/gh/duartegroup/autodE/pull/319/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=duartegroup) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/duartegroup/autodE/pull/319/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=duartegroup) | `97.44% <100.00%> (+<0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=duartegroup#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

t-young31 commented 7 months ago

Hi @danielhollas – thanks for the PR

I'm not sure about this change, given the (admittedly small) increase in complexity for a saving of O(ms) in a package that tends to execute (with external QM calculations) in O(h)

juraskov commented 7 months ago

Hi Tom, thanks for your comment. I agree that the change in timing is small in comparison to the cost of QM computations, however, sometimes we import autode when we do debugging in interactive mode and the import time is noticeable. This is also what motivated this change in the first place. We also noticed that importing autode slows down the import of mlptrain. We are planning to update the version of autode in mlptrain soon and I think it would be good if this update also increases the import speed.

t-young31 commented 7 months ago

I'm not sure you'll notice a 0.1s change. Nevertheless, I don't think the overhead of remembering to import matplotlib lazily is that high, so happy to merge with a couple of edits. @danielhollas would you mind:

[x] Updating the changelog
[x] Target the v1.4.2 branch instead of master
[x] Add yourself to the contributors list

danielhollas commented 7 months ago

Done.

Nevertheless, I don't think the overhead of remembering to import matplotlib lazily is that high

Happy to contribute a test that will check that matplotlib is not loaded after autode import.

t-young31 commented 7 months ago

Happy to contribute a test that will check that matplotlib is not loaded after autode import

Yes please 👍🏼

danielhollas commented 7 months ago

@t-young31 I've added a test and verified that it fails on the main branch and passes here.

duartegroup / autodE

Speedup package import time #319

Codecov Report