catalyst-cooperative / pudl

The Public Utility Data Liberation Project provides analysis-ready energy system data to climate advocates, researchers, policymakers, and journalists.
https://catalyst.coop/pudl
MIT License
456 stars 105 forks source link

EIA923M: Add data maturity flag and 2023 quarterly data #2930

Closed aesharpe closed 7 months ago

aesharpe commented 8 months ago

Our current 923 archiver contains annual and YTD data. We need to come up with a way to distinguish the data maturity of each of the partitions. The way we currently determine data_maturity is via the add_data_maturity() function in the extract/excel.py module. In the future it might make sense to add a data_maturity field to the archive metadata, but for now, we'll update the add_data_maturity() to account for the EIA923 data. There's a category for the data_maturity column called incremental_ytd that was created for the purpose of this type of data.

The raw files are formatted as such (examples):

EIA923_Schedules_2_3_4_5_M_07_2023_20SEP2023.xlsx EIA923_Schedules_2_3_4_5_M_12_2022_Early_Release.xlsx

Where the value after M is the the last month the have data. Data with 12 is annual, data with values less than 12 are YTD. We can use regex to extract the month from here and add the incremental_ytd flag if it's not 12.

This issue will also inherently add the 2023 quarterly data for 923.

aesharpe commented 8 months ago

Close with #2936

e-belfer commented 7 months ago

Marked done but not closed, closing