Closed zaneselvans closed 2 years ago
I think we can create an EIA 860m DataSource
definition and retain the settings validation that is happening in Eia860Settings.check_860m_date()
. It could look something like this (I don't know what DataSource model will look like):
class Eia860Settings(GenericDatasetSettings):
working_partitions: ClassVar = DataSource.from_source("eia860").working_partitions
eia860m_date: ClassVar[str] = DataSource.from_source("eia860m").working_partitions
working_tables: ClassVar = DataSource.from_source("eia860").working_tables
years: List[int] = working_partitions["years"]
tables: List[str] = working_tables
eia860m: bool = False
@validator("eia860m")
def check_860m_date(cls, eia860m: bool) -> bool:
...
This consolidation is done, with the exception of the Zenodo archiver which has a bunch of its own (not mostly duplicative) metadata, see #1419 and this draft PR
Where is the Data Source metadata now
pudl/metadata/sources.py
(where we're trying to consolidate it)pudl-zenodo-storage/zs/metadata.py
pudl/metadata/constants.py
(licenses, contributors, etc.)pudl/workspace/datastore.py::ZenodoFetcher
(DOIs for Zenodo archives)pudl/settings.py
(working partitions)pudl/metdata/resources.py
(working tables -- to be extracted dynamically based onetl_group
)Classes / Data Models:
Implemented using Pydantic.
EtlSettings
DataSource
classes to do that.EtlSettings
object will define what tables are ultimately produced and which ETL functions are run.DataSource
has its own specificEtlSettings
class defined. There's another class that contains several of these individualEtlSettings
objects and defines a whole PUDL ETL run.DataSource
working_partitions
describing the chunks of data that can be processed.DataSource
by looking up whichResources
have theDataSource.name
in theiretl_group
parameter.Resource
&Package
etl_group
they belong to.etl_group
value corresponds to a differentDataSource
definition.etl_group
/DataSource
IDs, the correspondingResource
definitions can be identified, and used to generate aPackage
. ThatPackage
could generate the list of all valid tables across all of the specifiedDataSource
s... if that's useful. But it would be simpler if we could keep the validation of tables specific to the individualDataSource
s and theDataSource
specificEtlSettings
classes.Questions
EIA-860m
eia860m
fit into all this. Is it a standaloneDataSource
, or is it bolted onto the side of theeia860
?eia860
-- extracted and concatenated together, ifeia860m==True
.eia860m
fromeia860
but still create clean metadata structures that can be used both in the Zenodo archives and in theEtlSettings
context?Implicit Tables
DataSource
classes? The static tables, the glue tables? Do we always load them? Are they always valid?DataSource
, and becomes a property of the collection of allDataSource
s.ferc1
then you automatically get the (static)ferc_accounts
table. If you're processingferc1
andeia923
then you automatically get theeia_ferc1
glue tables.EtlSettings
to specifying which data partitions andDataSource
s are being processed, and always assume all available tables within the data source will be provided, generating that list of tables internally.Related Issues / PRs
1409
1410
1419
1424
1425