CouncilDataProject / cdp-data

Data Utilities and Processing Generalized for All CDP Instances
https://councildataproject.org/cdp-data
MIT License
5 stars 4 forks source link

Prototype delayed dask dataframes for better computation scaling #9

Open evamaxfield opened 2 years ago

evamaxfield commented 2 years ago

Currently all computation is multithreaded by default and that doesn't leave a lot of room to the user in terms of deciding how and when they want to actually gather data or compute a result.

The datasets module can easily be switched over to dask for data gathering and caching

The keywords module should be looked at for how to use dask dataframes for just-in-time compute and out-of-memory compute.