catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Create data packaging scripts for export #211

Closed zaneselvans closed 5 years ago

zaneselvans commented 5 years ago

Create data packages containing all the main tables from the PUDL database and appropriate table schemas defining their relationships. This should include the data tables from:

It should also include the tabular outputs from the outputs module, like MCOE.

See our MSHA data packaging script as a template, and also check ou

zaneselvans commented 5 years ago

There's now a bunch of infrastructure in pudl.output.export for exporting data packages pretty automatically. The main thing we need to do at this point is develop a metadata library that can store the pieces of information we don't have in the database in a machine readable way. Maybe a metadata directory in the datastore? Or a top level metadata directory? Or create a metadata module that can gobble up all the crap that's accumulated in constants.py and organize it better, along with a collection of CSV files and JSON files that store that information more appropriately?

zaneselvans commented 5 years ago

This issue was closed because after thinking about it, and discussion on #258, we've decided to make frictionless data packages the canonical output, so this task is no longer necessary.