catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Nightly Build Failure 2024-04-27 #3593

Open zaneselvans opened 2 months ago

zaneselvans commented 2 months ago

Overview

class Extractor(ParquetExtractor):
    """Extractor for NREL ATB."""

    def __init__(self, *args, **kwargs):
        """Initialize the module.

        Args:
            ds (:class:datastore.Datastore): Initialized datastore.
        """
        self.METADATA = GenericMetadata("nrelatb")
        super().__init__(*args, **kwargs)

raw_nrelatb__all_dfs = raw_df_factory(Extractor, name="nrelatb")

@asset(
    required_resource_keys={"datastore", "dataset_settings"},
)
def raw_nrelatb__data(raw_nrelatb__all_dfs):
    """Extract raw NREL ATB data from annual parquet files to one dataframe.

    Returns:
        An extracted NREL ATB dataframe.
    """
    return Output(value=raw_nrelatb__all_dfs["data"])
cmgosnell commented 2 months ago

ty for catching the non-working partitions in the full settings! I'm also confused why the validations didn't fail for me locally. after changing the working partitions in sources i was able to re-run the full extraction and only get the working years. that's weird for sure.

A lot of the magic is happening via extract.extractor.raw_df_factory which runs extract.extractor.partition_extractor_factory which uses the datastore and the dataset_settings. I was mirroring the eia 176 extract which required those two as inputs into the asset but doesn't pass them around - but instead accesses them within raw_df_factory.

I agree in general that the extractor setup needs some documentation cleanup and maybe some higher level explanation somewhere.

jdangerx commented 2 months ago

Tangible outcome here is:

should have failed on import but that wasn't happening.

jdangerx commented 5 days ago

@e-belfer might deal with this incidentally as part of integrating the new ATB.