Closed bendnorman closed 2 years ago
I created #1324 for the unique dataset problem. Pretty bizarre that PyYaml doesn't enforce unique keys.
The two remaining tasks are related to where we specify working tables and partitions. I decided to keep PUDL_TABLES
and WORKING_PARTITIONS
dictionaries in pudl.constants because it was cleaner than including these constants in the pydantic settings models. I suppose we could infer working partitions from datapackage.jsons. This could be an issue because we might archive all partitions of a dataset but only implement the ETL for some of the partitions. If we infer the partitions from datapackage.json we would need to specify missing partitions in the code somewhere. This is discussed in #1264.
Requirements:
epacems
,eia
,ferc1
etc.) not a list.eia860m
rather thaneia860_ytd
as a code for that eia sub-dataset.all
which causes all working data partitions to be processed.PUDL_TABLES
andWORKING_PARTITIONS
dictionaries frompudl.constants
as that information will be subsumed within the ETL settings class / validation structures.PUDL_TABLES
andWORKING_PARTITIONS
. Note that this is potentially distinct from what raw input data is available, and that information is already stored in the metadata of the raw input datapackages.