Open irm-codebase opened 3 months ago
This happens in the following:
ipdb> ch_industry_subsector_energy_use.unstack('year').columns.unique()
Index(['2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012',
'2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
'2022', '2023', 'FERN', 'nan'],
dtype='object', name='year')
Fixes (in annual_energy_balances.py
)
The core problem is how CHE industry data is read over time: it will 'desync' as the document is updated each year. The current values probably correspond to the 2022 document. Someone updated the link, breaking the script.
read_industry_subsector
, update nrows=11
ch_carriers
to below:ch_carriers = { # first row in which carriers are defined in the file
25: "E7000", # 'electricity',
53: "O4000XBIO", # 'oil',
81: "G3000", # 'gas',
108: "C0000X0350-0370", # 'solid_fuel',
132: "W6100_6220", # 'waste',
156: "O4000XBIO", # 'oil',
198: "H8000", # 'heat', # purchased
237: "R5110-5150_W6000RI", # 'biofuel'
}
What happened?
If you update the repo to the current version, and try to build energy balances with it, scripts depending on 'annual-energy-balances.csv' will fail.
This is because two new columns are introduced introduced ('NaN', 'FERN'), somewhere in the CHE processing.
To replicate, just read the dataset like most rules do:
pd.read_csv("build/data/annual-energy-balances.csv", index_col=["cat_code", "carrier_code", "unit", "country", "year"],header=0).squeeze()
Version
1.0.0
Relevant log output
No response