catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Consolidate data source metadata using a Pydantic model #1426

Closed zaneselvans closed 2 years ago

zaneselvans commented 2 years ago

Where is the Data Source metadata now

Classes / Data Models:

Implemented using Pydantic.

EtlSettings

DataSource

Resource & Package

Questions

EIA-860m

Implicit Tables

Related Issues / PRs

bendnorman commented 2 years ago

I think we can create an EIA 860m DataSource definition and retain the settings validation that is happening in Eia860Settings.check_860m_date(). It could look something like this (I don't know what DataSource model will look like):

class Eia860Settings(GenericDatasetSettings):
    working_partitions: ClassVar = DataSource.from_source("eia860").working_partitions
    eia860m_date: ClassVar[str] = DataSource.from_source("eia860m").working_partitions
    working_tables: ClassVar = DataSource.from_source("eia860").working_tables

    years: List[int] = working_partitions["years"]
    tables: List[str] = working_tables
    eia860m: bool = False

    @validator("eia860m")
    def check_860m_date(cls, eia860m: bool) -> bool:
           ...
zaneselvans commented 2 years ago

This consolidation is done, with the exception of the Zenodo archiver which has a bunch of its own (not mostly duplicative) metadata, see #1419 and this draft PR