catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
468 stars 107 forks source link

Convert FERC ETL to assets #2085

Closed bendnorman closed 1 year ago

bendnorman commented 1 year ago

Convert ferc_to_sqlite and the ferc1 ETL to use dagster concepts.

bendnorman commented 1 year ago

I ended up using creating a separate dagster Definition for the ferc_to_sqlite logic with an op for the dbf2sqlite and xbrl2sqlite processes. We don't know the asset names before run time for the dbf2sqlite and xbrl2sqlite processes so we can't use assets. We could maybe collect the dbf table names prior to extracting the data but I'm not sure if this is possible for the xbrl data.

This actually creates a nice delineation between sections of our full data process. The ETL and ferc extraction happens in separate Definitions. The ETL accesses the extracted ferc tables using SourceAssets. The jobs in separate definitions can then be linked together by sensors.