catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

Second half of 2022 missing from `fuel_receipts_costs_aggs_eia` table #2956

Open arengel opened 8 months ago

arengel commented 8 months ago

Describe the bug

In the fuel_receipts_costs_aggs_eia of the pudl.sqlite, at least as of Oct 18, 2022 data is incomplete for 'quarterly' and 'monthly' temporal aggregations and missing for 'annual'.

Bug Severity

How badly is this bug affecting you?

To Reproduce

Steps to reproduce the behavior -- ideally including a code snippet that causes the error to appear.

import pandas as pd
import sqlalchemy as sa

pd.read_sql_table(
    "fuel_receipts_costs_aggs_eia",
    sa.create_engine("sqlite:////.../pudl.sqlite").connect(),
).groupby("temporal_agg").report_date.max()
temporal_agg
annual      2021-01-01
monthly     2022-06-01
quarterly   2022-04-01
Name: report_date, dtype: datetime64[ns]

Expected behavior

A clear and concise description of what you expected to happen, or what you expected the data to look like.

Software Environment?

@UdayVaradarajan, adding you here for visibility or to add any info.

zaneselvans commented 8 months ago

@arengel in previous outputs was the data available through the end of 2022? The input archive looks like it's from February, 2023, and I think EIA updates that bulk data output frequently, so I would imagine it should have been. But just wanted to clarify if this was a regression, or a newly discovered deficiency.

Looking at an older pudl.sqlite I've laying around from 2023-09-21 I get the same result as above.

arengel commented 8 months ago

Its a newly discovered deficiency as far as I know, found while updating our input datasets to include 2022.

zaneselvans commented 8 months ago

Okay, good to know.

Unfortunately Zenodo's migration to a new backend at the end of last week has temporarily hosed our archiving infrastructure. @e-belfer is updating it to work with the new API over in this PR. As soon as that's fixed we can make a new bulk electricity data archive and get the most recent data in there.