OSeMOSYS / otoole

OSeMOSYS Tools for Energy
https://otoole.readthedocs.io
MIT License
25 stars 19 forks source link

Create missing params when reading from excel #136

Closed trevorb1 closed 1 year ago

trevorb1 commented 1 year ago

Hello!

Overview

In this PR I implement logic in the ReadExcel class to create empty dataframes for parameters that do not have an associated excel sheet. This PR relates to issue #99, and this comment in issue #114.

Problem Addressed

I was reviewing issue #99 and encountered a very similar issue to the one described. When we convert from anything to an Excel file, only parameters that have non-default values are written. For example, in Simplicity the DiscountRate is given by a default value, therefore, no DiscountRate sheet will appear after running the command otoole convert <> excel <> params.xlsx config.yaml.

In a different comment (see issue #114) I discuss if this is actually the functionality we want. Or if we should be writing out empty sheets when the parameter does not have values to help guide users on how to input data.

Nevertheless, with the current functionality, when reading in the Excel data it is not capturing parameters without dedicated sheets. For example, if no DiscountRate sheet exists in the excel file, then no DiscountRate key will be created in the otoole internal datastore. While this doesn't cause any otoole logic issues, no reference to the missing parameters will be written. This is problematic, as, for example, you will have an OSeMOSYS datafile without a DiscountRate parameter.

Logic

To fix this, I add a new function to the ReadExcel class called _get_missing_params( ... ). This function takes in the parameters read in from excel and compares them against all defined parameters in the config.yaml file. If a parameter is missing in the excel data, an empty dataframe is created and added to the internal datastore. Two tests are added to the test_read_strategies.py module to accompany this logic.

Other notes

Currently, this only checks parameters. A minor modification is needed if we want it to check for missing sets as well. However, I decided to hold off on adding that until we discussed if this is the correct path to go down to fix this issue :)

Thanks!

trevorb1 commented 1 year ago

Hi @willu47! Good call, I think that makes way more sense! I have updated this PR to generalize the _get_missing_params( ... ) function (issue #99), and provide excel templates for missing parameters (issue #114). These two issues are grouped into one PR since they cover similar issues; however, if quite a few more changes needs to happen from this point I am happy to break them up.

Issue 99 Updates

In the commit "added function to get missing input values", I move the _get_missing_params( ... ) function to the ReadStrategy class, rename it to _get_missing_input_dataframes(...), and generalize it to work with any of param, sets, and result types. A test is added as well to the TestReadStrategy class.

Questions

  1. The function _get_missing_input_dataframes(...) will create empty, but correctly formatted, dataframes for any missing param, set or result that is listed in the config file but is not found in the internal datastore after reading in data. To keep this general, I have the user pass in an argument for what parameter to search for (param, set or result).

My question is, I often have to loop over two of these items (see comment below) which seems a little clunky. Do you think there is a way to simplify this or change the logic at all? My first thought was to just accept a list of inputs to loop over, instead of a single string. However I just move the loop to inside the function which I'm not sure is any better haha 😅

Issue 114 Updates

In the commit "added excel template function", I add a new function to the WriteExcel class called _form_parameter_template(...) which will create wide formatted template dataframes when writing out params that do not have data. A test for this is also added.

Questions

  1. To create the wide formatted data, I needed to pass in the input_data to correctly index the columns as years. This involved changing the WriteStrategy.write(...) method to accept keyword arguments to allow for the passing of input_data. While I don't think this was a bad solution, Im not sure how clean it is either. I couldn't think of any other way to get the years for long formatted data to the WriteExcel class though?

  2. I slightly changed the logic of the the _form_parameter(...) method. I moved the check for empty dataframes out of this function and to where the function is called in the WriteExcel.write_parameter(...) method. While this means _form_parameter(...) will no longer catch if the input df is empty, since it is a private method and only used in this one place, I thought it improved readability. My question is, does this tradeoff make sense? Or is it better to keep those high level quick checks in the function?