catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
471 stars 108 forks source link

Data Release 0.6.0 (2022-03-11) data from 2001-2008 #1552

Closed yifei-liu-yl closed 2 years ago

yifei-liu-yl commented 2 years ago

The data release on Zenodo Archives do not contains data from 2001-2008 for EIA923 and EIA860. Could these earlier years data be added to the Archives?

Many thanks and I appreciate your time.

zaneselvans commented 2 years ago

Well that would be bad! Can you say which tables you're seeing this problem in? Maybe with a code snippet?

Some of the tables don't have comparable data that's reported in the earlier years. E.g. generation_fuel_eia923 goes back to 2001, but generation_eia923 and fuel_receipts_costs_eia923 table only go back as far as 2008.

yifei-liu-yl commented 2 years ago

I think tables involve EIA923 and EIA860 (e.g. pudl_out.gen_eia923(), pudl_out.bga_eia860(), pudl_out.frc_eia923(), pudl_out.bf_eia923()) do not have 2001-2008. And some of them do not have data after 2018. I think the output table from the MCOE analysis may be affected too. I basically follow the codes below to get the pudl_out:

Locate the PUDL DB file

pudl_settings = pudl.workspace.setup.get_defaults()

Connect to SQLAlchemy Connection Engine

pudl_engine = sa.create_engine(pudl_settings["pudl_db"])

see all the tables inside of the database

sa.inspect(pudl_engine).get_table_names()

pudl output object

pudl_out = pudl.output.pudltabl.PudlTabl(pudl_engine=pudl_engine)

Thanks.

zaneselvans commented 2 years ago

All of the tables you mentioned:

only go back as far as 2008 due to limitations in the reporting to EIA -- that data just wasn't available earlier (the FRC data is available from another form, the EIA-423, as far back as 2002 but we haven't integrated it yet. See #1302). We should definitely document this better int he metadata for those tables. I've created #1553 to track that.

Can you say more about the missing post-2018 data?

yifei-liu-yl commented 2 years ago

Just want to check whether I understand what's going on with the data: these tables can only go back to 2008 because that was the earliest time when the plant-generator-boiler linkage can be established.

And thanks for creating #1553 to make the metadata clearer.

I think I made a wrong statement about the missing post-2018 data.

zaneselvans commented 2 years ago

Hey sorry I missed this response earlier!

The boiler-generator associations aren't the limiting factor here (though they also don't go all the way back to 2001). The problem is that this data wasn't collected by the EIA at all in those earlier years. Or in some cases it was collected in a different data source (e.g. the EIA-423) which we haven't integrated yet. Before 2008, the only table that shows up in the EIA spreadsheets related to generation & fuel consumption is the "Generation and Fuel Data" tab, which becomes the generation_fuel_eia923 table in the PUDL DB. All the other tabs in the EIA-923 prior to 2008 just list the fuel stocks on hand.

I'll go ahead and close this issue since sadly the data just isn't available. But feel free to re-open it if I've gotten something wrong!