etsap-TIMES / xl2times

Open source tool to convert TIMES models specified in Excel
https://xl2times.readthedocs.io/
MIT License
12 stars 7 forks source link

TimesModel.data_years() throws exception for non-int years #217

Open SamRWest opened 8 months ago

SamRWest commented 8 months ago

Austimes has some non-integer-parseable year values (e.g. 2070-EOH) which this function fails on:

pd.unique(attributes["year"])
array([0, 2015.0, 2016.0, 2017.0, 2018.0, 2019.0, 2020.0, 2021.0, 2022.0,
       2023.0, 2024.0, 2025.0, 2026.0, 2027.0, 2028.0, 2029.0, 2030.0,
       2031.0, 2032.0, 2033.0, 2034.0, 2035.0, 2036.0, 2037.0, 2038.0,
       2039.0, 2040.0, 2041.0, 2042.0, 2043.0, 2044.0, 2045.0, 2046.0,
       2047.0, 2048.0, 2049.0, 2050.0, 2051.0, 2052.0, 2053.0, 2054.0,
       2055.0, 2056.0, 2057.0, 2058.0, 2059.0, 2060.0, 2013.0, 2011.0,
       2009.0, 2007.0, 2005.0, 2101.0, 2012, '2070-EOH', 2061, 2010, 2014,
       '2024-EOH', '2045-EOH', 'BOH-EOH', 1980, 1981, 1982, 1983, 1984,
       1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995,
       1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2006, 2008],
      dtype=object)

If the intention was to collect all the valid years, this could be fixed by updating to:

data_years.update(attributes["year"].astype(int).values)

to:

 int_years = attributes["year"].astype(int, errors='ignore') # leave non-parseable vals alone
 int_years = [y for y in int_years if isinstance(y, int)]  # remove non-parseable years
 data_years.update(int_years.values)

Would this work, or would we need to interpret/expand these ranges?

olejandro commented 8 months ago

Interesting! Dash-separated values should not have reached this far... We should definetely expand them. Could you check in raw_tables.txt the tag of the table they are originating from?

SamRWest commented 8 months ago

Interesting! Dash-separated values should not have reached this far... We should definetely expand them. Could you check in raw_tables.txt the tag of the table they are originating from?

Sure thing, these are all the spots they appear in raw_tables.txt

range: L12:R16
filename: benchmarks/xlsx\../../../austimes-lfs\SuppXLS/Scen_Par-austimes_CCA.xlsx
tag: ~TFM_INS
types: Attribute (object), PSET_SET (object), PSET_PN (object), CSET_CN (object), Year (object), AllRegions (object)
Attribute,PSET_SET,PSET_PN,CSET_CN,Year,AllRegions
PRC_FOFF,,T*Dsl,"TraDsl,TraB20",2070-EOH,1
PRC_FOFF,,T*Pet,"TraPet,TraE10",2070-EOH,1
PRC_FOFF,,T*Lpg,TraLpg,2070-EOH,1
range: B72:F74
filename: benchmarks/xlsx\../../../austimes-lfs\SuppXLS/Scen_nsv-austimes_csiro_1.xlsx
uc_sets: {'R_S': 'NSA,ADE,SESA,NQ,CQ,SWQ,SEQ,LV,MEL,CVIC,NVIC,SWNSW,NNS,NCEN,CAN,TAS,ACT,NSW,QLD,SA,VIC'}
tag: ~TFM_INS
types: attribute (object), process (object), year (object), value (object)
attribute,process,year,value
PRC_AOFF,"EE_StmTurb312,EE_StmTurb313",2024-EOH,1
range: L10:R14
filename: benchmarks/xlsx\../../../austimes-lfs\SuppXLS/Scen_nsv-austimes_csiro_1.xlsx
tag: ~TFM_INS
types: Attribute (object), PSET_SET (object), PSET_PN (object), CSET_CN (object), Year (object), AllRegions (object)
Attribute,PSET_SET,PSET_PN,CSET_CN,Year,AllRegions
PRC_FOFF,,T*Dsl,"TraDsl,TraB20",2045-EOH,1
PRC_FOFF,,T*Pet,"TraPet,TraE10",2045-EOH,1
PRC_FOFF,,T*Lpg,TraLpg,2045-EOH,1
range: S22:AB25
filename: benchmarks/xlsx\../../../austimes-lfs\SuppXLS/Scen_TransportPolicies.xlsx
uc_sets: {'R_E': 'allregions'}
tag: ~TFM_INS
types: attribute (object), CSET_CN (object), year (object), allregions (object), NSW (object), QLD (object), VIC (object), PSET_CO (object), PSET_CI (object)
attribute,CSET_CN,year,allregions,NSW,QLD,VIC,PSET_CO,PSET_CI
PRC_FOFF,TraE10,BOH-EOH,1,0,0,0,TraE10,
PRC_FOFF,TraE10,BOH-EOH,1,,,,-T_pas?,TraE10
olejandro commented 8 months ago

Thanks! We currently expand dash-separated values only for FI_T tables (lines 542-555 in process_flexible_import_tables. The easiest fix would be to extract that code to a separate tranform and apply it also to TFM_INS tables.

SamRWest commented 8 months ago

No worries. Thanks for the pointer, I'll look into it.