OSeMOSYS / osemosys2iamc

MIT License
3 stars 11 forks source link

Refactor filter #23

Closed willu47 closed 1 year ago

willu47 commented 2 years ago

Combined filter emission and filter emission tech

HauHe commented 1 year ago

Regarding the question in your initial comment @willu47, the function of which line 81 was part of was only dealing with the emissions from CCS (biomass and fossil fuels), so the captured CO2, and those need to be reported in positive numbers in the IAMC format. I realise the function name and description could have been a bit more indicative.

danielhuppmann commented 1 year ago

Re @HauHe for avoiding possible misunderstandings:

dealing with the emissions from CCS (biomass and fossil fuels), so the captured CO2, and those need to be reported in positive numbers in the IAMC format.

The IAMC format specifies that each data point is indexed by the dimensions model, scenario, region, variable, unit and a time domain.

The convention whether a value should be reported as positive or negative, what exactly should be counted, and what the relevant unit should be is always linked to the variable.

So the statement should read "captured CO2 is reported in positive numbers in the variable x" - but I'm not sure if that is actually true, at least not for "Emissions|CO2"...

HauHe commented 1 year ago

Thanks @danielhuppmann for the help in being precise! The function that Will was referring to was/is producing data for the IAMC variables Carbon Capture|Biomass and Carbon Capture|Fossil. In OSeMBE we get these emissions as negative numbers, but the IAMC variables should be positive if I'm not mistaken.

danielhuppmann commented 1 year ago

Thanks for the clarification @HauHe, you are right, the Carbon Capture|* variables should be reported as positive values as specified in the openENTRANCE variable codelist.

https://github.com/openENTRANCE/openentrance/blob/2315b611e8cc8b86701ad78407c90dbe30f30175/definitions/variable/emissions/emissions.yaml#L4

willu47 commented 1 year ago

Thanks a lot for spending the time and improving this script @willu47 !

Hi @HauHe - no problem - a guilty pleasure!

It looks good to me. But I haven't run it. Is that commonly done when reviewing in code development?

Yes, often a reviewer will try to run the code to ensure it works on their system.

I have only one question, relating to the above discussion on the IAMC variables Carbon Capture|Biomass and Carbon Capture|Fossil, where do we turn the values positive? Perhaps I just missed it in the scripts.

At the moment we don't. One idea is that we add a transform: abs field to the configuration which tells the script to return the absolute value.

willu47 commented 1 year ago

@HauHe - this is ready for your check. Note the two most recent commits. It would be great to test this with a "production" run now.

willu47 commented 1 year ago

@Hauke - to test on your laptop prior to merging:

git pull origin main
git checkout refactor_filter
pip install -e .
osesmosys2iamc --help
HauHe commented 1 year ago

Hi @willu47, I installed the package and tried to convert some OSeMBE results. But I get the below error. Seems like some function is lacking some input, but not sure how to track down why. Before the error the terminal showed quite a lot of dataframes. Mostly 3 columns and the printing the first and last 5 rows.

C:\Users\adm.esa\Documents\GitHub\OSeMOSYS\osemosys2iamc\src\osemosys2iamc\resultify.py:147: FutureWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.
  years = pd.Series()
Traceback (most recent call last):
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\Scripts\osemosys2iamc.exe\__main__.py", line 7, in <module>
  File "C:\Users\adm.esa\Documents\GitHub\OSeMOSYS\osemosys2iamc\src\osemosys2iamc\resultify.py", line 432, in entry_point
    all_data = main(config, inputs_path, results_path)
  File "C:\Users\adm.esa\Documents\GitHub\OSeMOSYS\osemosys2iamc\src\osemosys2iamc\resultify.py", line 357, in main
    data = calculate_trade(results, technologies)
  File "C:\Users\adm.esa\Documents\GitHub\OSeMOSYS\osemosys2iamc\src\osemosys2iamc\resultify.py", line 150, in calculate_trade
    df_f = filter_technologies(techs)
TypeError: filter_technologies() missing 1 required positional argument: 'technologies'
HauHe commented 1 year ago

Ok, figured out what's wrong

HauHe commented 1 year ago

Sorry that alone didn't fix it yet.

HauHe commented 1 year ago

The new error is in related to the line to set the index before the conversion to a IamDataFrame:


Traceback (most recent call last):
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\Scripts\osemosys2iamc.exe\__main__.py", line 7, in <module>
  File "C:\Users\adm.esa\Documents\GitHub\OSeMOSYS\osemosys2iamc\src\osemosys2iamc\resultify.py", line 432, in entry_point
    all_data = main(config, inputs_path, results_path)
  File "C:\Users\adm.esa\Documents\GitHub\OSeMOSYS\osemosys2iamc\src\osemosys2iamc\resultify.py", line 387, in main
    data.index = data.index.set_levels(data.index.levels[2].map(iso_mapping), level=2)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\site-packages\pandas\core\indexes\multi.py", line 912, in set_levels
    idx._set_levels(
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\site-packages\pandas\core\indexes\multi.py", line 795, in _set_levels
    new_codes = self._verify_integrity(levels=new_levels)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\site-packages\pandas\core\indexes\multi.py", line 418, in _verify_integrity
    raise ValueError(
ValueError: Level values must be unique: [nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan] on level 2```
willu47 commented 1 year ago

This is the error caused by not installing the openentrance with an editable install. Run pip uninstall openentrance then try to install openentrance again

danielhuppmann commented 1 year ago

I assume that this is related to https://github.com/openENTRANCE/openentrance/issues/202.

From my point of view, there are three options:

Happy to discuss this further wherever you feel the discussion is most appropriate...

HauHe commented 1 year ago

Sorry, missed to install openEntrance as editable. Did so now, but I'm getting a new error. It seems like there is an issue with the column region the dataframe that should be converted to IAMC format. See below:

[1286 rows x 3 columns]
2022-08-24 11:00:21 ERROR    Empty cells in `data` (columns: 'region'):
             model   scenario region                      variable unit  year      value
783  OSeMBE v1.0.0  DIAG-Base    NaN          Capacity|Electricity   GW  2015  43.185219
784  OSeMBE v1.0.0  DIAG-Base    NaN  Capacity|Electricity|Biomass   GW  2015   0.087459
785  OSeMBE v1.0.0  DIAG-Base    NaN     Capacity|Electricity|Coal   GW  2015   0.008000
786  OSeMBE v1.0.0  DIAG-Base    NaN      Capacity|Electricity|Gas   GW  2015   1.315077
787  OSeMBE v1.0.0  DIAG-Base    NaN    Capacity|Electricity|Hydro   GW  2015  31.851904
...
Traceback (most recent call last):
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\Scripts\osemosys2iamc.exe\__main__.py", line 7, in <module>
  File "C:\Users\adm.esa\Documents\GitHub\OSeMOSYS\osemosys2iamc\src\osemosys2iamc\resultify.py", line 432, in entry_point
    all_data = main(config, inputs_path, results_path)
  File "C:\Users\adm.esa\Documents\GitHub\OSeMOSYS\osemosys2iamc\src\osemosys2iamc\resultify.py", line 388, in main
    all_data = pyam.IamDataFrame(data)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\site-packages\pyam\core.py", line 159, in __init__
    self._init(data, meta, index=index, **kwargs)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\site-packages\pyam\core.py", line 175, in _init
    _data = format_data(data.copy(), index=index, **kwargs)
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\site-packages\pyam\utils.py", line 358, in format_data
    raise_data_error(
  File "C:\Users\adm.esa\Anaconda3\envs\osemosys2iamc_test\lib\site-packages\pyam\logging.py", line 32, in raise_data_error
    raise ValueError(msg)
ValueError: Empty cells in `data` (columns: 'region'):
             model   scenario region                      variable unit  year      value
783  OSeMBE v1.0.0  DIAG-Base    NaN          Capacity|Electricity   GW  2015  43.185219
784  OSeMBE v1.0.0  DIAG-Base    NaN  Capacity|Electricity|Biomass   GW  2015   0.087459
785  OSeMBE v1.0.0  DIAG-Base    NaN     Capacity|Electricity|Coal   GW  2015   0.008000
786  OSeMBE v1.0.0  DIAG-Base    NaN      Capacity|Electricity|Gas   GW  2015   1.315077
787  OSeMBE v1.0.0  DIAG-Base    NaN    Capacity|Electricity|Hydro   GW  2015  31.851904
danielhuppmann commented 1 year ago

@hauhe, not quite sure why this is formatted as a long string - the way that this is usually displayed is intended to be more easily readable.

The relevant line is

ValueError: Empty cells in data (columns: 'region')

and the part after that gives you an indication where missing row is located.

willu47 commented 1 year ago

@HauHe, not quite sure why this is formatted as a long string - the way that this is usually displayed is intended to be more easily readable.

The relevant line is

ValueError: Empty cells in data (columns: 'region')

and the part after that gives you an indication where missing row is located.

@HauHe - surround you code with three backticks to make a codeblock. Add python after the first three backticks to get syntax highlighting e.g.

```python your code ```

willu47 commented 1 year ago

@HauHe - could you share the results set and configuration file so I can try to replicate this bug?

HauHe commented 1 year ago

@willu47 I have put them here https://kth-my.sharepoint.com/:f:/g/personal/vmartin_ug_kth_se/EjWO0cOVmXxEqGjFQrZ0JUsBb3qFATZSMZZyRvj_Y_NPVQ?e=pBLik5 I guess that there is somewhere a line missing that gets the region from the technology of fuel string, e.g.: df['REGION'] = df['TECHNOOGY'][:2] Have started searching, but didn't find it yet.

HauHe commented 1 year ago

I added a comment in the code above. It seems like the iso_mapping is not converting the NO to Norway, but to nan.

danielhuppmann commented 1 year ago

Oh, seems that I solved this problem already a while ago - but now, the fix doesn't work any more.

"NO" is interpreted False by the yaml-import, see this line.

Not sure why the quotes do not work any more to solve this issue, will investigate.

danielhuppmann commented 1 year ago

No luck yet for a solution, but at least I know we're not alone - see paragraph 4 here

willu47 commented 1 year ago

Oh, seems that I solved this problem already a while ago - but now, the fix doesn't work any more.

"NO" is interpreted False by the yaml-import, see this line.

Not sure why the quotes do not work any more to solve this issue, will investigate.

Ah yes, this is a classic! Good debugging @HauHe!

In the meantime, I have refactored the trade calculation to be more pandas-tastic which has greatly simplified the function, and I have added a test, removed the print statements etc.

The v6.0 of PyYAML I am using on OSX does not have the issue with NO=nan.

tniet commented 1 year ago

Hi @willu47 @HauHe @danielhuppmann - I've seen this error importing yaml files to clewsy. It can be prevented with this code snippet (stolen off the internet for clewsy):

For handling unintentional boolean operators from yaml file

from yaml.constructor import Constructor
def add_bool(self, node):
    return self.construct_scalar(node)
yaml.SafeLoader.add_constructor(u'tag:yaml.org,2002:bool', add_bool)

Hope that helps!

danielhuppmann commented 1 year ago

Thanks @tniet! This direct fix doesn't work for us because we do need to load "true" and "false" as booleans, but you've set me on the right path - will make a PR to the nomenclature package in a bit to fix this problem.

danielhuppmann commented 1 year ago

Follow-up upon further investigation: it seems that l NO is parsed as string correctly by pyyaml thanks to the quotes, but it is converted to bool somewhere in the nomenclature.DataStructureDefinition processing…

danielhuppmann commented 1 year ago

The issue with Norway being False should be solved with https://github.com/IAMconsortium/nomenclature/pull/171, please install the latest of nomenclature. FYI, the culprit was pydantic, not PyYAML...

willu47 commented 1 year ago

@HauHe - I've addressed all your review comments. This should be ready to merge now.

HauHe commented 1 year ago

Thanks for the help and the bug fix in the pydantic package @danielhuppmann ! I now tested it and works. But I installed the nomenclature package editable locally, since I noticed that your bug fix is not yet in the latest release of the nomenclature package. Perhaps you could create a minor release?

HauHe commented 1 year ago

Seems like I can't do the merge. At least I don't see the option...