bendnorman commented 2 years ago

1292 made it easier to validate new dataset settings. If we add a new dataset we will have to:

add the dataset's working tables to pc.PUDL_TABLES
add the dataset's working partitions to pc.WORKING_PARTITIONS
create a settings class by calling NewDatasetSettings = create_dataset_settings("new_dataset")
Add any additional validation methods to NewDatasetSettings.
Add NewDatasetSettings to DatasetSettings.

This isn’t that bad but I think we could improve in a few areas:

When a user adds a new dataset they will have to specify the table schemas. For each dataset, pc.WORKING_TABLES could be inferred from the dataset table schemas.
Instead of calling create_dataset_settings(dataset_name) for every dataset in pc.WORKING_PARTITIONS, we could just loop over each dataset in WORKING_PARTITIONS.
DatasetSettings could be dynamically generated by finding all Subclasses of GenericDatasetSettings.

bendnorman commented 2 years ago

Some of these improvements are covered in #1409 and #1410

bendnorman commented 6 days ago

I haven't interacted with the pydantic settings classes in a while. I'm not sure if these proposals are still relevant.

catalyst-cooperative / pudl