IAMconsortium / pyam

Analysis & visualization of energy & climate scenarios
https://pyam-iamc.readthedocs.io/
Apache License 2.0
227 stars 119 forks source link

Integration with ixmp - function name? #375

Open danielhuppmann opened 4 years ago

danielhuppmann commented 4 years ago

An IamDataFrame can already be initialised directly from an ixmp.Scenario (via the ixmp-Python-API, see here).

Now, I'm thinking about adding a function in pyam to save an IamDataFrame to an ixmp database, probably df.to_ixmp(mp, version='new'). This would be useful for the scenario-explorer processing pipeline.

More importantly, it would be useful to first validate that all regions in the IamDataFrame exist in the connected database (the code would look something like set(df.regions()).difference(mp_public.regions().region). Similar functions for units would also be helpful.

The question is - what would be a proper name(space) for such utils function:

Or put these functions in ixmp?

gidden commented 4 years ago

I would suggest putting them in ixmp to minimize maintenance overlaps between the two.

khaeru commented 4 years ago

I would suggest putting them in ixmp to minimize maintenance overlaps between the two.

Hm, I might disagree. At the moment ixmp does not have a dependency on pyam (message_ix.reporting targets pyam, but that's an optional/extra dependency). I'm not familiar with the pyam codebase, but it seems like ixmp is not in setup.py and is only a protected/optional import in read_ixmp.py.

For ease of maintenance, I think it would be better to continue this state of affairs, i.e. avoid introducing mutual dependencies. (Note this is distinct from testing to make sure the packages work well together in all the expected ways—we should definitely continue/expand that.)

I think the pieces can go entirely in pyam, using Pythonic duck typing based on the ixmp Platform API.

  • pyam.ixmp.validate_regions(df)

I like this idea of a pyam.ixmp.[name]—or IamDataFrame.ixmp.[name]—namespace. This echoes the 'accessor' pattern used in xarray that seems well-considered, and gives you a pattern for future compatibility code for other packages (e.g. idf.sdmx.convert() or similar)

For implementation I'd suggest some general functions:

def diff(a, b):
    _a = set(a)
    _b = set(b)

    return _a - _b, _a & _b, _b - _a

def compare(idf, mp, name):
    if name == 'region':
        a = idf.regions()
        # Duck typing: fails if mp is not the right kind of object.
        # Could catch AttributeError and raise as TypeError,
        # without any need to import ixmp.
        b = mp.regions().region
    elif name == 'unit':
        # …
    return diff(a, b)

Then:

def validate(idf, mp, column_or_columns=['region', 'unit', ...]):
    valid = True
    for col in columns:
        only_pyam, both, only_ixmp = compare(idf, mp, col)
        valid &= len(only_pyam) == len(only_ixmp) == 0
    return valid

to be called like idf.ixmp.validate(mp) or idf.ixmp.validate(mp, 'region') or idf.ixmp.compare(mp, 'region') according to what the user/other pyam code needs to do.

(By making compare() atomic, its return value could later help to e.g. update the Platform with only missing units from only_pyam.)

danielhuppmann commented 4 years ago

thanks @khaeru, this sounds reasonable. Indeed, pyam depends on ixmp (as an optional dependency), so doing a reverse-dependency (as suggested by @gidden) might cause problems.