Closed zaneselvans closed 5 years ago
There's now a bunch of infrastructure in pudl.output.export
for exporting data packages pretty automatically. The main thing we need to do at this point is develop a metadata library that can store the pieces of information we don't have in the database in a machine readable way. Maybe a metadata
directory in the datastore? Or a top level metadata directory? Or create a metadata
module that can gobble up all the crap that's accumulated in constants.py
and organize it better, along with a collection of CSV files and JSON files that store that information more appropriately?
This issue was closed because after thinking about it, and discussion on #258, we've decided to make frictionless data packages the canonical output, so this task is no longer necessary.
Create data packages containing all the main tables from the PUDL database and appropriate table schemas defining their relationships. This should include the data tables from:
It should also include the tabular outputs from the outputs module, like MCOE.
See our MSHA data packaging script as a template, and also check ou