It turns out that pulling vast amounts of data into memory and normalizing it into a pandas.DataFrame is painfully slow :sweat_smile:. This makes report generation extremely slow. Iterating becomes painful, even with #71 in place.
This PR adds caching support for functions that yield data frames, using HDF files as backbone. It also includes some minimal changes to how we handle units with pint, which has a massive impact on performance (according to cProfile and my own observations). Builds on #71.
Type of change
[ ] 🐛 Bugfix (change which fixes an issue)
[x] 🚀 Feature (change which adds functionality)
[ ] 📚 Documentation (change which fixes or extends documentation)
Checklist
Put an x in the boxes that apply. This is simply a reminder of what we will require before merging your code.
[x] Lint and unit tests (if any) pass locally with my changes
[x] I have added tests that prove my fix is effective or that my feature works
[x] I have added necessary documentation (if appropriate)
Proposed changes
It turns out that pulling vast amounts of data into memory and normalizing it into a
pandas.DataFrame
is painfully slow :sweat_smile:. This makes report generation extremely slow. Iterating becomes painful, even with #71 in place.This PR adds caching support for functions that yield data frames, using HDF files as backbone. It also includes some minimal changes to how we handle units with
pint
, which has a massive impact on performance (according tocProfile
and my own observations). Builds on #71.Type of change
Checklist
Put an
x
in the boxes that apply. This is simply a reminder of what we will require before merging your code.