The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
1292 made it easier to validate new dataset settings. If we add a new dataset we will have to:
add the dataset's working tables to pc.PUDL_TABLES
add the dataset's working partitions to pc.WORKING_PARTITIONS
create a settings class by calling NewDatasetSettings = create_dataset_settings("new_dataset")
Add any additional validation methods to NewDatasetSettings.
Add NewDatasetSettings to DatasetSettings.
This isn’t that bad but I think we could improve in a few areas:
When a user adds a new dataset they will have to specify the table schemas. For each dataset, pc.WORKING_TABLES could be inferred from the dataset table schemas.
Instead of calling create_dataset_settings(dataset_name) for every dataset in pc.WORKING_PARTITIONS, we could just loop over each dataset in WORKING_PARTITIONS.
DatasetSettings could be dynamically generated by finding all Subclasses of GenericDatasetSettings.
1292 made it easier to validate new dataset settings. If we add a new dataset we will have to:
pc.PUDL_TABLES
pc.WORKING_PARTITIONS
NewDatasetSettings = create_dataset_settings("new_dataset")
NewDatasetSettings
toDatasetSettings
.This isn’t that bad but I think we could improve in a few areas:
pc.WORKING_TABLES
could be inferred from the dataset table schemas.create_dataset_settings(dataset_name)
for every dataset inpc.WORKING_PARTITIONS
, we could just loop over each dataset inWORKING_PARTITIONS
.DatasetSettings
could be dynamically generated by finding all Subclasses of GenericDatasetSettings.