Closed danielhuppmann closed 2 years ago
Ad 1. Ah yes, that's a blunder on my side, sorry for that. Just issued #50 that addresses this.
Ad 2. That was my initial design but I found it clearer to read if I just a a single variable dictionary that I iterate over and then pass the kwargs to pyam.IamDataFrame.aggregate_region()
instead of creating two. Would it actually bring a performance boost if we gave it a list of variables or just delegate the looping from nomenclature to pyam?
Would it actually bring a performance boost if we gave it a list of variables or just delegate the looping from nomenclature to pyam?
Yes, because pyam doesn’t iterate over the variables - it uses pandas.groubpy() only once for the list of variables.
Then the question would be how pandas solves the groupby because at some point there has to be a loop. It might be not be implemented in python though and would therefore might be faster than a native python loop.
The current implementation iterates over all common-regions, then creating the variable-kwargs-dictionary, then iterating over each variable.
Two ways to significantly improve performance:
aggregate_region()
method can take a list of variables if there are no additional arguments (weight, method, ...). So the variable-kwargs-dictionary could be distinguished into a "summed variables"-list plus a "other-method variables" dictionary.