catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Implement pydantic validation of existing settings file #1288

Closed bendnorman closed 2 years ago

bendnorman commented 2 years ago

Requirements:

bendnorman commented 2 years ago

1292 closed out 10/13 of these tasks.

I created #1324 for the unique dataset problem. Pretty bizarre that PyYaml doesn't enforce unique keys.

The two remaining tasks are related to where we specify working tables and partitions. I decided to keep PUDL_TABLES and WORKING_PARTITIONS dictionaries in pudl.constants because it was cleaner than including these constants in the pydantic settings models. I suppose we could infer working partitions from datapackage.jsons. This could be an issue because we might archive all partitions of a dataset but only implement the ETL for some of the partitions. If we infer the partitions from datapackage.json we would need to specify missing partitions in the code somewhere. This is discussed in #1264.