catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 106 forks source link

Boiler manufacturer name and code are null #2522

Open zaneselvans opened 1 year ago

zaneselvans commented 1 year ago

Describe the bug

In the boilers_entity_eia table, there are only null values for the boiler_manufactuer and boiler_manufacturer_code columns.

This is true in the outputs, and also after the clean_boilers_eia860() has been run, where this is done:

    # Add boiler manufacturer name to column
    b_df["boiler_manufacturer"] = b_df.boiler_manufacturer_code.map(
        pudl.helpers.label_map(
            CODE_METADATA["environmental_equipment_manufacturers_eia"]["df"],
            from_col="code",
            to_col="description",
            null_value=pd.NA,
        )
    )

Not sure why this isn't being caught by our "no null columns" data validations.

e-belfer commented 1 year ago

Is this only happening in the fast ETL, or in the full as well? This is a column that only exists in 2009 & 2010 I believe.

zaneselvans commented 1 year ago

Ahhh, interesting. Not sure if I checked both.