Closed trevorb1 closed 1 year ago
@trevorb1 - I think we should revisit this. I suggest we avoid writing out zero valued results, especially if those are going to text files (such as CSV), as it results in very slow performance.
Where zeroed intermediate values are required for a calculation to be performed correctly, it might be possible to use the fill_value
argument in pandas operations e.g.
fill_value
float or None, default None Fill existing missing (NaN) values, and any new element needed for successful DataFrame alignment, with this value before computation. If data in both corresponding DataFrame locations is missing the result will be missing.
See pandas.mul for example.
UPDATES Following your feedback, @willu47, I have made the writing of default values an optional CLI flag. If the flag is not specified, the results are written out as normally expected. This flag works with both converting between formats and when calculating results.
LOGIC
Added an _expand_defaults(...)
function into the WriteStrategy
class. If the flag --write_defaults
is called, the internal data structure (ie. the dictionary of dataframes) will be modified to include all default values. This happens before any of the format specific writing functions are called.
EXAMPLE
For data conversions:
otoole convert datapackage datafile datapackage.json data.txt config.yaml --write_defaults
For result processing:
otoole results cbc csv cbc.sol results config.yaml --input_datapackage datapackage.json --write_defaults
OTHER UPDATES
--write_default
flagThanks ! Please let me know if anything else needs to be changed and I will make sure to get to it sooner this time !
In this PR, I have added functionality to write out default result values. This builds on PR #125.
This functionality includes:
config.yaml
An example of this functionality is shown below.
Previous Functionality For new capacity results, there may be no capacity additions in 2017, 2018 and 2019
New Functionality For new capacity results, the result default value in
config.yaml
is0
This can lead to circumstances where large amounts of data are being written out though. For example, RateOfActivity is defined over region ,technology, year, timeslice, fuel, mode. In these cases lots of empty data may be written out.... Not sure if that behaviour is actually desired?