compartmentalize ETL & glue based on data sets

catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.

MIT License

471 stars 108 forks source link

We want to allow various users to ingest only the data sets they care about. So we want to partition the ETL process such that anyone could ingest only the data sets that they are working on and only the glue connecting those data sets. Obviously this is only FERC/CEMS/EIA right now, but this should allow the ingest of 860 and CEMS but nothing else, for example.

Using the years input into init_db in order to determine whether or not a data set is being ingested, we can populate a small table with a record for each data source and a boolean to determine if it's been ingested for future reference in outputs and analysis.

catalyst-cooperative / pudl

compartmentalize ETL & glue based on data sets #195