Ekumen-OS / lambkin

Apache License 2.0
7 stars 0 forks source link

Cache `DataFrames` for processing speed #72

Closed hidmic closed 11 months ago

hidmic commented 11 months ago

Proposed changes

It turns out that pulling vast amounts of data into memory and normalizing it into a pandas.DataFrame is painfully slow :sweat_smile:. This makes report generation extremely slow. Iterating becomes painful, even with #71 in place.

This PR adds caching support for functions that yield data frames, using HDF files as backbone. It also includes some minimal changes to how we handle units with pint, which has a massive impact on performance (according to cProfile and my own observations). Builds on #71.

Type of change

Checklist

Put an x in the boxes that apply. This is simply a reminder of what we will require before merging your code.