New Feature Proposal: compute Kaya identity factors

zacharyschmidt commented 1 month ago

This feature would add methods to the IamDataFrame to compute Kaya identity factors according to the methodology described in Koomey et al 2019 and 2022.

KoomeyExploringBlackBox2022FINAL.pdf SupplementalinformationKoomeyExploringBlackBox-FINAL.docx

InsidetheblackboxFINAL2019.pdf AppendicesInsidetheBlackBox-v61.docx

Our idea is to add three methods to the public api of the compute module which return Kaya variables, Kaya factors, and an LMDI decomposition. Please let me know if the compute module is not the right place for this feature!

Kaya Variables These are produced by simple transformations of the input data variables, mostly doing arithmetic with the emissions and CCS input variables to get the quantities we're interested in.

Kaya Factors These are the terms of the Expanded Kaya identity, calculated from the Kaya variables.

LMDI Decomposition The Log-Mean Divisia Index method attributes the a portion of the total change in emissions from the reference scenario to the intervention scenario to each Kaya Factor.

Below are example tests for the method to compute Kaya variables. I hope this is enough to get the discussion started. As I progress with the development I'll update this thread with questions that come up.

import pandas as pd
import input_variable_names 
import kaya_variable_names 
import pytest

from pyam import IamDataFrame
from pyam.testing import assert_iamframe_equal
from pyam.utils import IAMC_IDX

TEST_DF = IamDataFrame(
    pd.DataFrame(
        [
            [input_variable_names.POPULATION, "million", 1000],
            [input_variable_names.GDP_PPP, "billion USD_2005/yr", 6],
            [input_variable_names.GDP_MER, "billion USD_2005/yr", 5],
            [input_variable_names.FINAL_ENERGY, "EJ/yr", 8],
            [input_variable_names.PRIMARY_ENERGY, "EJ/yr", 10],
            [input_variable_names.PRIMARY_ENERGY_COAL, "EJ/yr", 5],
            [input_variable_names.PRIMARY_ENERGY_GAS, "EJ/yr", 2],
            [input_variable_names.PRIMARY_ENERGY_OIL, "EJ/yr", 2],
            [input_variable_names.EMISSIONS_CO2_FOSSIL_FUELS_AND_INDUSTRY, "Mt CO2/yr", 10],
            [input_variable_names.EMISSIONS_CO2_INDUSTRIAL_PROCESSES, "Mt CO2/yr", 1],
            [input_variable_names.EMISSIONS_CO2_CARBON_CAPTURE_AND_STORAGE, "Mt CO2/yr", 4],
            [input_variable_names.EMISSIONS_CO2_CARBON_CAPTURE_AND_STORAGE_BIOMASS, "Mt CO2/yr", 1],
            [input_variable_names.CCS_FOSSIL_ENERGY, "Mt CO2/yr", 2],
            [input_variable_names.CCS_FOSSIL_INDUSTRY, "Mt CO2/yr", 1],
            [input_variable_names.CCS_BIOMASS_ENERGY, "Mt CO2/yr", 0.5],
            [input_variable_names.CCS_BIOMASS_INDUSTRY, "Mt CO2/yr", 0.5],
        ],
        columns=["variable", "unit", 2010],
    ),
    model="model_a",
    scenario="scen_a",
    region="World", 
)

EXP_DF = IamDataFrame(
    pd.DataFrame(
        [   
            [kaya_variable_names.POPULATION, "billion", 1.0],
            [kaya_variable_names.GNP, "billion USD_2010/yr", 6.6],
            [kaya_variable_names.FINAL_ENERGY, "EJ/yr", 8.0],
            [kaya_variable_names.PRIMARY_ENERGY, "EJ/yr", 10.0],
            [kaya_variable_names.PRIMARY_ENERGY_FF, "EJ/yr", 9.0],
            [kaya_variable_names.TFC, "Mt CO2/yr", 12.0],
            [kaya_variable_names.NFC, "Mt CO2/yr", 10.0],
        ],
        columns=["variable", "unit", 2010],
    ),
    model="model_a",
    scenario="scen_a",
    region="World", 
)

@pytest.mark.parametrize("append", (False, True))
def test_kaya_variables(append):
    """Test computing kaya variables"""

    if append:
        obs = TEST_DF.copy()
        obs.compute.kaya_variables(scenarios=['scen_a'], append=True)
        assert_iamframe_equal(TEST_DF.append(EXP_DF), obs)
    else:
        obs = TEST_DF.compute.kaya_variables(scenarios=['scenario_a'])
        assert_iamframe_equal(EXP_DF, obs)

@pytest.mark.parametrize("append", (False, True))
def test_kaya_variables_empty_when_input_variables_missing(append):
    """Assert that computing kaya variables with missing input variables returns empty"""

    if append:
        obs = TEST_DF.copy()
        (obs.filter(variable=input_variable_names.POPULATION)  # select subset of required input variables
         .compute.kaya_variables(scenarios=['scen_a'], append=True)
        )
        assert_iamframe_equal(TEST_DF, obs)  # assert that no data was added
    else:
        obs = TEST_DF.compute.kaya_variables(scenarios=['scen_a'])
        assert obs.empty

danielhuppmann commented 1 month ago

Thanks @zacharyschmidt for the proposal! I took the liberty of editing your issue-description to 1) add the model, scenario and region dimensions directly when initializing the IamDataFrame (not as data columns), and 2) format the code as python, both to improve readability. I'll follow up with more comments later.

zacharyschmidt commented 1 month ago

Thanks @danielhuppmann! Glad you could take a first look at it.

danielhuppmann commented 1 month ago

Now with a bit more time to think this through...

I don't think it's a good idea to change the units associated with variables (looking at population) - better to keep the IAMC convention and do the conversion only in the methods (you can easily use convert_unit() to do that on the fly).

For implementation, you can use

df.aggregate("Primary Energy|Fossil", ["Primary Energy|Coal", "Primary Energy|Oil", "Primary Energy"])

to do the aggregation.

You can also do mathematical operations to do the computations, see this tutorial. Basically it works like
```
df.<method>(a, b, c) => a <op> b = c
```
where a, b and c are variables (or other dimensions if you use the dimension argument). And pyam will make sure that this works with multiple models/scenarios/regions in one go, and even keeping the units correct...
You can use require_data() to check whether a scenario has all relevant information before even starting the processing...
Making the variable names configurable is a nice feature, but I suggest to default to the common IAMC variables, see https://github.com/iamconsortium/common-definitions - all there except for TFC and NFC, which can quickly be added to the common-definitions repo.

zacharyschmidt commented 1 week ago

Thanks for those recommendations! I am using all of them for the implementation.

For point 5 I have a few questions. My intention with the input_variable_names module was simply to avoid repeating string literals throughout the source code. I defined the variable names as constants so I can use autocomplete instead of copy/paste. Here's the input_variable_names module.

POPULATION = "Population"
GDP_MER = "GDP|MER"
GDP_PPP = "GDP|PPP"
FINAL_ENERGY = "Final Energy"
PRIMARY_ENERGY = "Primary Energy"
PRIMARY_ENERGY_FF = "Primary Energy (fossil fuels)"
PRIMARY_ENERGY_COAL = "Primary Energy|Coal"
PRIMARY_ENERGY_OIL = "Primary Energy|Oil"
PRIMARY_ENERGY_GAS = "Primary Energy|Gas"
EMISSIONS_CO2_INDUSTRIAL_PROCESSES = "Emissions|CO2|Industrial Processes"
EMISSIONS_CO2_CARBON_CAPTURE_AND_STORAGE = "Emissions|CO2|Carbon Capture and Storage"
EMISSIONS_CO2_CARBON_CAPTURE_AND_STORAGE_BIOMASS = "Emissions|CO2|Carbon Capture and Storage|Biomass"
EMISSIONS_CO2_FOSSIL_FUELS_AND_INDUSTRY = "Emissions|CO2|Fossil Fuels and Industry"
EMISSIONS_CO2_AFOLU = "Emissions|CO2|AFOLU"
CCS_FOSSIL_ENERGY = "Carbon Sequestration|CCS|Fossil|Energy"
CCS_FOSSIL_INDUSTRY = "Carbon Sequestration|CCS|Fossil|Industrial Processes"
CCS_BIOMASS_ENERGY = "Carbon Sequestration|CCS|Biomass|Energy"
CCS_BIOMASS_INDUSTRY = "Carbon Sequestration|CCS|Biomass|Industrial Processes"

I think that specific variable names are not used at all in the existing pyam source code (except for test data), so I don't have an example to look at. Let me know what's preferred in terms of defined constants vs direct use of strings and I'll follow that.

Also, thanks for pointing me to the common-definitions repo. I'll make a pull request there to add TFC and NFC.

danielhuppmann commented 1 week ago

Right, we shouldn't hard-code anything in the actual source code - I only meant that the input_variables_names module should be consistent with common-definitions. The one conflict I see is "Primary Energy (fossil fuels)", which is usually "Primary Energy|Fossil" in IAM reporting.

IAMconsortium / pyam

New Feature Proposal: compute Kaya identity factors #875