Open znicholls opened 4 years ago
Thanks @znicholls for the cross-reference. This is similar to the recent improvements of aggregate() (#305 & #312). This now supports a method
arg (min, max, weighted sum) and doing "bulk" aggregation by passing a list of variables to be aggregated. Similar for aggregate_region(), the related check_*()
functions and the downscale_region() function.
The current implementation in pyam, though, only works with a list of variables when you do the "obvious" aggregation, i.e., all subcategories of each variable - whereas the implementation in silicone takes a dict(variable=components)
. This would be a useful addition to pyam, I think.
As for the second feature, I have some concerns that this would be overloading the function. If the multiplication relates to conversion to CO2-equivalents, I'd rather use separate function for conversion and aggregation, for example:
_df = df.convert_unit('Mt CH4', 'Mt CO2e', context='gap_AR5GWP100')
_df.aggregate('Kyoto GHG', components=['Emissions|CO2', 'Emissions|CH4'])
Or the first step could use the dataframe-operations feature (work in progress by @gidden, see #333)...
The main use of the factors is to do subtraction with -1, e.g.
aggregate = "Emissions|CO2|Other"
other_CO2 = mi.infill_composite_values( sr15_data, { aggregate: { "Emissions|CO2": 1, "Emissions|CO2|Energy and Industrial Processes": -1, "Emissions|CO2|AFOLU": -1, } }, )
I just allow any multiple in case people want to do other things like weightings. It could be restricted to a sign if you want.
Yep I completely agree with all of that. I wasn’t actually thinking of altering aggregate
directly, rather creating a new method or utility function. That method or utility would wrap the operations done by @gidden and/or aggregate
. Does that a sensible way forward to you?
I'm not quite sure what you have in mind and whether it's worth the additional maintenance overhead if it's just a wrapper for two or three existing functions.
Can you specify the suggested function name and the API (kwargs and returned object)?
whether it's worth the additional maintenance overhead if it's just a wrapper for two or three existing functions
I'm also not sure.
My current thoughts (@Rlamboll may have others)
def bulk_aggregate(iamdf, aggregates):
"""
Aggregate variables within a :obj:`pyam.IamDataFrame`
This convenience function allows a number of aggregate variables to be
calculated from the data within a :obj:`pyam.IamDataFrame`. The
aggregation is flexible, allowing users to write potentially complex
algorithms.
Parameters
----------
iamdf : :obj:`pyam.IamDataFrame`
:obj:`pyam.IamDataFrame` containing the data from which the aggregates can be calculated
aggregates : dict{str: dict{str: float}}
Dictionary specifying how to calculate the aggregates. Each key is the
name of an aggregate variable to be calculated. Each value is itself a
dictionary. The keys are variables which already exist in ``iamdf``
and the values of are constants which are multiplied by the value of
that variable's data before being included in the aggregate (i.e. sum).
Returns
-------
:obj:`pyam.IamDataFrame`
:obj:`pyam.IamDataFrame` containing the aggregate data (can be
appended to the source :obj:`pyam.IamDataFrame` by the user if
desired).
Examples
--------
# simply take aggregate of multiple variables
bulk_aggregate(
iamdf=input_df,
aggregates={
"Emissions|CO2": {
"Emissions|CO2|Industrial": 1,
"Emissions|CO2|AFOLU": 1
},
"Emissions|CH4": {
"Emissions|CH4|Industrial": 1,
"Emissions|CH4|AFOLU": 1
},
},
)
# one variable is difference between two others, one is sum of one
# variable plus two times another
bulk_aggregate(
iamdf=input_df,
aggregates={
"Emissions|CO2|AFOLU": {
"Emissions|CO2": 1,
"Emissions|CO2|Industrial": -1
},
"Emissions|SOx (RF weighted)": {
"Emissions|SOx|Industrial": 2,
"Emissions|SOx|AFOLU": 1,
},
},
)
"""
Yeah, I don't know that there's a pressing need for it other than to infill values defined by consistency conditions, which is very much something you'd download silicone to do. I don't mind it going into pyam wholesale, but it feels time-consuming to duplicate it with slightly different options concerning the inputs in both pyam and silicone.
Just for completeness, a big benefit of having it in pyam is that you have a bigger team of maintainers and users. Happy with whatever though, just wanted to ask the question.
Yes, pushing features that be useful to many users "upstream" is definitely welcome in principle - maybe just too specific for the infilling use-case with factors.
But let me reiterate that an extension of aggregate()
and similar function to take a mapping dictionary would be welcome, i.e, if you have a mapping = {variable: [<list of components>]}
,
currently one needs to do:
for v, lst in mapping.items():
df.aggregate(v, lst)
This could be streamlined to allow:
df.aggregate(mapping)
In https://github.com/znicholls/silicone/pull/72, @Rlamboll has added a feature to do bulk aggregation within an
IamDataFrame
. This is a convenience function, but means you could quickly specify to calculate a bunch of aggregate variables (some of which are more complex than just a pure sum) without writing a custom wrapper every time (current implementation here, work in progress though).@danielhuppmann Does this feature already exist? If no, is it something you'd be interested in bringing across?