CouncilDataProject / cdp-data

Data Utilities and Processing Generalized for All CDP Instances
https://councildataproject.org/cdp-data
MIT License
5 stars 4 forks source link

Add a `dump_to_sqlite` function to the library #13

Open evamaxfield opened 2 years ago

evamaxfield commented 2 years ago

Feature Description

Add a function to dump all data stored in a CDP Firestore database to a local sqlite file.

Use Case

A lot more people know how to use SQL than our weird combination of Firestore + ORM in Python. And for tabular data (related to voting, legislation, people info, etc.), SQL is likely the best choice for quick and easy processing. There are also visualization engines that can read sqlite I believe??

Solution

Add a function to the library (prototype fine for now) that takes in the CDP Instance name the user wants to create a CDP sqlite file for and the filepath / filename for where to dump the data to something like:

def dump_to_sqlite(instance: str, path: Union[str, Path]):

That takes iteratively goes through each collection and requests data in batches from Firestore and writes in batches to the sqlite file.

Notes

I assume the database models themselves should stay the same: schema-diagram & model-docs

Since we use FireO for our "Firestore ORM" -- their docs on querying data (including batched) are likely important: https://octabyte.io/FireO/querying-data

An example of using the FireO models can be seen in this notebook or in our source code

I say just use the sqlite3 library that ships with Python?