IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
226 stars 118 forks source link

`check_aggregate()` multiplier argument #387

Closed byersiiasa closed 4 years ago

byersiiasa commented 4 years ago

Not sure if multiplier argument is not working - or I am using it wrong.

Data for 2010 is perfect aggregate (100), whilst for 2015 aggregate value is 100 whilst the components make 80.

check_aggregate() does seem to be working when multiplier=1... e.g.

pyam.__version__
Out[68]: '0.5.0+12.g2101cfe'
test = pyam.IamDataFrame('data_agg_test.xlsx')
test.head(6)
Out[63]: 
  model scenario region             variable unit  year  value
0    mm        x     r1        Emissions|CO2  xxx  2010  100.0
1    mm        x     r1        Emissions|CO2  xxx  2015  100.0
2    mm        x     r1  Emissions|CO2|AFOLU  xxx  2010   55.0
3    mm        x     r1  Emissions|CO2|AFOLU  xxx  2015   70.0
4    mm        x     r1    Emissions|CO2|FFI  xxx  2010   45.0
5    mm        x     r1    Emissions|CO2|FFI  xxx  2015   10.0
# test for exact - this works
test.check_aggregate(variable='Emissions|CO2')
pyam.core - INFO: `Emissions|CO2` - 1 of 2 rows are not aggregates of components
Out[58]: 
                                               variable  components
model scenario region variable      unit year                      
mm    x        r1     Emissions|CO2 xxx  2015     100.0       80.0

Now test this using multiplier argument, we get two rows falsely identified as failures - but they both should have passed as we set a very high threshold multiplier.

test.check_aggregate(variable='Emissions|CO2', multiplier=5)
pyam.core - INFO: `Emissions|CO2` - 2 of 2 rows are not aggregates of components
Out[60]: 
                                               variable  components
model scenario region variable      unit year                      
mm    x        r1     Emissions|CO2 xxx  2010     100.0       100.0
                                         2015     100.0       80.0

But it does work in this case. The exact aggregate is fine (2010), but the wrong aggregate (2015) is picked up correctly...

test.check_aggregate(variable='Emissions|CO2', multiplier=1.0000000001)
pyam.core - INFO: `Emissions|CO2` - 1 of 2 rows are not aggregates of components
Out[62]: 
                                               variable  components
model scenario region variable      unit year                      
mm    x        r1     Emissions|CO2 xxx  2015     100.0       80.0

data_agg_test.xlsx

danielhuppmann commented 4 years ago

I think you are confusing the multiplier with the tolerance levels, which can be passed to the np.isclose() (ie. atol, rtol). The multiplier says that the value of variable should be close to multiplier times value of components - and 100 is neither 5x100 nor 5x80 in the second example, so correctly fails with standard tolerance. In the third example, the multiplier is just below the standard tolerance, hence the function considers the value and components equal.

byersiiasa commented 4 years ago

yeah - now I see what you mean, thanks!

I thought that multiplier is used to set the tolerance. 1000, because I think previous version had a tolerance argument than I then thought was replaced by multiplier.