Should we make a harmonize module?

gidden commented 3 years ago

We are at the moment beginning to pick up more use of aneris to do harmonization at various stages of analysis. When I first wrote it, I had a specific use case/project in mind, but that is now expanding across a couple of different areas. And it was "pre pyam".

So, question to our community here - would it make sense to operationally move aneris inside of pyam? I would envision this to effective wrap the current aneris functionality and return a pyam.IamDataFrame of results (or inplace). Currently, aneris uses a decision-tree approach by default, but we could allow this to be overridden by a single method in this interface to keep things simple as well.

I am wondering what others think about this. I know that harmonization is a very common operation and probably implemented all over the place. Would it make sense to have one 'canoncial' interface for it? Would you use it if there was one? Or should we just keep it separate and in its own repo.

I will admit bias, I think it would be nice to have it be part of the community toolbox to also help with maintenance efforts. I do not want it to die due to neglect =)

Would love any and all input, but especially from @danielhuppmann @coroa @gaurav-ganti @l-welder @znicholls @Rlamboll @jkikstra

znicholls commented 3 years ago

I actually pulled out aneris the other day to play around with so I would absolutely love to find a way forward to keep it going. In my head, I think the key is a balance between two things:

Making repos easy to maintain by having them do one thing, and do that one thing well
Making our lives easy by maintaining fewer repos

My personal view is that the first point is more important than the second (i.e. I'd rather have lots of little repos to look after than one enormous one for reasons I can explain in more detail if anyone is interested). In that vein, I'd be happy to put jump on board with aneris to get it going again.

danielhuppmann commented 3 years ago

Great suggestion @gidden! I think that from a user perspective, having this within pyam would just make lives a lot easier, because one won't have to deal with i/o, filtering, renaming (or can just use the pyam functions) and, most importantly, one can directly use the pyam plotting library to look at the results.

I don't have a strong preference whether this makes more sense implementation-wise as a submodule (separate repo) or a module within pyam... But submodules always caused problems when I dealt with them, and a dependency requires releasing aneris on both pip and conda.

coroa commented 3 years ago

(EDIT: I did not say hi on any of the available channels yet. So here it is: HI. I joined @gidden 's work-team two months ago and have become a regular user of aneris and pyam since. I typically have many ideas for improvements and am not too shy to propose PRs, if time permits. Happy to work together with all of you.)

I don't have a strong preference for either solution and would make the choice dependent on the envisioned interface:

If we want to have an interface like:

harmonized_model_idf = model_idf.harmonize(history=hist_idf)

then I would urge to include the aneris modules directly into the pyam repository, rather than carrying aneris as an explicit dependency along.

If we are happy enough with an update to aneris HarmonizationDriver to allow something like:

harmonized_model_idf = HarmonizationDriver(model_idf, hist_idf).harmonize()

then pyam does not need the aneris dependency and aneris can live happily alongside pyam.

In either case, I think it makes sense to think about making region mappings a bit more explicit in pyam (ie provide a simple interface to the region_mapping.csv delivered with it) in conjunction with an adaptation of aneris.

danielhuppmann commented 3 years ago

About the interface to the region mapping: this would be also highly relevant for the aggregate_region() and downscale_region() functions, which currently take a many-to-one or one-to-many mapping - but it would be better to (optionally) take a dictionary (possibly read from a csv/yaml file). Maybe start a separate issue to discuss this?

jkikstra commented 3 years ago

Hi all, apologies for the delayed response.

From a user perspective, I'm very happy to see this discussion going on. Indeed, as @gidden mentions, while we now use aneris for emissions, it is a common operation that can (and should?) happen much more widely (i.e. for many more variables) in many assessments across the community; one could perhaps think about harmonizing carbon prices or solar capacities.

With this in mind, it would be really great to see the core of aneris being repurposed for creating a pyam.IamDataFrame.harmonize() function (the first option argued for by @coroa). Indeed, we could pass a method as argument. Maybe even write a 'generalised' tree function, but I have no feeling for whether such a thing is possible or makes sense (it will be very much 'expert judgement' for setting the parameters, @gidden might know better here?

IAMconsortium / pyam

Should we make a harmonize module? #425