Swiss energy balance data introduces invalid column in annual energy balances.

irm-codebase commented 3 months ago

What happened?

If you update the repo to the current version, and try to build energy balances with it, scripts depending on 'annual-energy-balances.csv' will fail.

This is because two new columns are introduced introduced ('NaN', 'FERN'), somewhere in the CHE processing.

>>> df.unstack()
year                                 NaN  1950  1960  1970  1978  1979  1980  1981  1982  1983  1984  1985  ...  2013  2014  2015  2016  2017  2018  2019  2020  2021  2022  2023  FERN
cat_code  carrier_code unit country                                                                         ...                                                                        
AFC       BIOE         TJ   ALB      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                            AUT      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                            BEL      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                            BGR      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN
                            BIH      NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN  ...   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN   NaN

To replicate, just read the dataset like most rules do:

pd.read_csv("build/data/annual-energy-balances.csv", index_col=["cat_code", "carrier_code", "unit", "country", "year"],header=0).squeeze()

Version

1.0.0

Relevant log output

No response

irm-codebase commented 3 months ago

This happens in the following:

ipdb> ch_industry_subsector_energy_use.unstack('year').columns.unique()
Index(['2004', '2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012',
       '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
       '2022', '2023', 'FERN', 'nan'],
      dtype='object', name='year')

irm-codebase commented 3 months ago

Fixes (in annual_energy_balances.py) The core problem is how CHE industry data is read over time: it will 'desync' as the document is updated each year. The current values probably correspond to the 2022 document. Someone updated the link, breaking the script.

in read_industry_subsector, update nrows=11
update ch_carriers to below:

ch_carriers = {  # first row in which carriers are defined in the file
        25: "E7000",  # 'electricity',
        53: "O4000XBIO",  # 'oil',
        81: "G3000",  # 'gas',
        108: "C0000X0350-0370",  # 'solid_fuel',
        132: "W6100_6220",  # 'waste',
        156: "O4000XBIO",  # 'oil',
        198: "H8000",  # 'heat',  # purchased
        237: "R5110-5150_W6000RI",  # 'biofuel'
    }

calliope-project / euro-calliope