Write result default values

trevorb1 commented 2 years ago

In this PR, I have added functionality to write out default result values. This builds on PR #125.

This functionality includes:

Reading in result default values from config.yaml
Adding the result default value to the dataframe if no value exists
Added appropriate unit tests

An example of this functionality is shown below.

Previous Functionality For new capacity results, there may be no capacity additions in 2017, 2018 and 2019

REGION	TECHNOLOGY	YEAR	VALUE
SIMPLICITY	HYD	2015	5
SIMPLICITY	HYD	2016	5
SIMPLICITY	HYD	2020	5

New Functionality For new capacity results, the result default value in config.yaml is 0

REGION	TECHNOLOGY	YEAR	VALUE
SIMPLICITY	HYD	2015	5
SIMPLICITY	HYD	2016	5
SIMPLICITY	HYD	2017	0
SIMPLICITY	HYD	2018	0
SIMPLICITY	HYD	2019	0
SIMPLICITY	HYD	2020	5

This can lead to circumstances where large amounts of data are being written out though. For example, RateOfActivity is defined over region ,technology, year, timeslice, fuel, mode. In these cases lots of empty data may be written out.... Not sure if that behaviour is actually desired?

willu47 commented 2 years ago

@trevorb1 - I think we should revisit this. I suggest we avoid writing out zero valued results, especially if those are going to text files (such as CSV), as it results in very slow performance.

Where zeroed intermediate values are required for a calculation to be performed correctly, it might be possible to use the fill_value argument in pandas operations e.g.

fill_value float or None, default None Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.

See pandas.mul for example.

trevorb1 commented 1 year ago

UPDATES Following your feedback, @willu47, I have made the writing of default values an optional CLI flag. If the flag is not specified, the results are written out as normally expected. This flag works with both converting between formats and when calculating results.

LOGIC Added an _expand_defaults(...) function into the WriteStrategy class. If the flag --write_defaults is called, the internal data structure (ie. the dictionary of dataframes) will be modified to include all default values. This happens before any of the format specific writing functions are called.

EXAMPLE For data conversions: otoole convert datapackage datafile datapackage.json data.txt config.yaml --write_defaults

For result processing: otoole results cbc csv cbc.sol results config.yaml --input_datapackage datapackage.json --write_defaults

OTHER UPDATES

Tests added for the _expand_defaults() function
Docs updated to reflect the --write_default flag

Thanks ! Please let me know if anything else needs to be changed and I will make sure to get to it sooner this time !

OSeMOSYS / otoole

Write result default values #126