CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
316 stars 123 forks source link

Impact class: add methods 'select' and 'append'? #67

Closed sameberenz closed 2 years ago

sameberenz commented 4 years ago

Hi everyone

Is there a any reasonable way to filter or combine CLIMADA Impact instances at the moment?

My guess is that the Impact would require methods comparable to Hazard.select() and Hazard.append(), right?

Here is an example what I would like to do:


impact = Impact()
impact.calc(...)
impact_subset1 = impact.select(years=[2005,2006,2008,2020,2021])
impact_subset2 = impact.select(event_names=['Petra', 'Antoinette', 'Hugo', 'Sergej'])

and impact_combined = impact_subset1.append(impact_subset2)

When filtering or combining two impact sets, we would need in any case to recompute the values of aai_agg, tot_value, and frequency, right? Maybe filtering/combining the hazard sets and re-running impact.calc() in a loop is a valid alternative?

For my application, however, I want to filter and re-combine my impact instances several times during analysis without rerunning impact.calc() all the time. Therefore, filtering hazard set beforehand is not my favourite option.

Are there any suggestions other than writing such methods in impact.py? Would it make sense to create a new module for "impact_manipulation"?

Thanks in advance for your inputs

ChrisFairless commented 4 years ago

@sameberenz was right when he mentioned this could be relevant to other issues (being discussed on slack). it would be cool if we could design something that let you subset/reweight an impact set to answer questions such as

sometimes you would want to create a new Impact object (as in Sam's example), and sometimes you'd just want to reweight and recalculate statistics from an existing one (e.g. for large event sets with many reweightings)

sameberenz commented 4 years ago

Weights as suggested by @ChrisFairless could work for filtering (set weight to 0 for events you don't want to consider, and 1 for the others).

For combining/appending Impact sets, a seperate method would still be required next to weights.

NB: A work-around to filter out events within an Impact set is setting frequency to 0, however, this is not what frequency is intended for.

davidnbresch commented 4 years ago

as for combining impacts, you might have a brief look at the OLD MATLAB version too (for feature inspiration, not the code):

https://github.com/davidnbresch/climada/blob/master/code/helper_functions/climada_EDS_combine.m https://github.com/davidnbresch/climada/blob/master/code/helper_functions/climada_EDS_combine.m

Prof. Dr. David N. Bresch Professor for Weather and Climate Risks Institute for Environmental Decisions ETH Zurich/MeteoSwiss dbresch@ethz.ch mailto:dbresch@ethz.ch, dbresch@meteoswiss.ch mailto:dbresch@meteoswiss.ch office: +41 44 632 77 87 mobile: +41 79 770 15 92 skype: davidnbresch zoom: https://ethz.zoom.us/my/ https://ethz.zoom.us/my/dbreschdbresch https://ethz.zoom.us/my/dbresch https://www.usys.ethz.ch/en/people/profile.david-bresch.html http://www.wcr.ethz.ch http://www.wcr.ethz.ch/

On 21 Sep 2020, at 12:55, Samuel Eberenz notifications@github.com wrote:

Weights as suggested by @ChrisFairless https://github.com/ChrisFairless could work for filtering (set weight to 0 for events you don't want to consider, and 1 for the others).

For combining/appending Impact sets, a seperate method would still be required next to weights.

NB: A work-around to filter out events within an Impact set is setting frequency to 0, however, this is not what frequency is intended for.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CLIMADA-project/climada_python/issues/67#issuecomment-696041272, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABW3G5A4AA4L2BGWUWZ25LTSG4WIPANCNFSM4RUKPLSQ.

ThomasRoosli commented 4 years ago

Hi Sam, Interesting question. I see two issues, the first is the recalculation of aai_agg, the validity of calc_freq_curve() etc. as you mentioned, the second is appending impacts of different haz_types, exposures, vulnerability and how to properly document that in tag attributes etc...

I think therefore, it would make more sense in some of your mentioned cases to define a function that outputs a result in e.g., a pandas.dataframe, and not a Impact class method that generates a new Impact object that needs to fullfill all requirements of "being an object of the Impact class".

basically: impact_at_event_combined_df = climada.entity.combine_impacts(impact_subset1, impact_subset2)

mmyrte commented 4 years ago

There is a (currently private) function Impact()._build_exp() which is used for plotting. Basically, it creates a Geodataframe of the same form the Exposure objects take. As soon as you have the impact in a GDF, subsetting etc becomes trivial, since everyone knows how to work with tabular data.

@emanuel-schmid - you mentioned at one point the "composition over inheritance" principle, which means that the more-or-less tabular data in the Impact class should not go the same way as the exposures and thus not inherit from GDF. Would you agree that a to_geodataframe method would be valuable here?

chahank commented 4 years ago

While this certainly works for certain application, that does not necessarily solve the problem. One requirement I think is to have access for instance to the .calc function of the impact. This would not be the case if you have a Geodataframe. To me it looks like this solution is equivalent to modify the Entity and Hazard objects before creating the Impact object.

mmyrte commented 4 years ago

Not entirely, because you could filter on the impact values themselves. But you're right, you'd need to either translate the filtered geopandas back into an Impact instance, or implement filter/append methods for a collection of attributes that may or may not grow in the future. Both seem equally messy to me. I just like thinking in tables, that's why I suggested it.

chahank commented 4 years ago

Oh, I fully agree that everything that can be done with tables (or matrices) is great and it is a good a idea to do it that way.

ChrisFairless commented 4 years ago

i mostly agree with @ThomasRoosli and @mmyrte about a function that applies a weighting and returns a data frame/geodataframe/plot, rather than modifying the Impact object.

i don't think the select part of the question needs us to think about situations where an Impact's exposure, hazard or IFs change, and therefore we don't need to think about re-running calc. That's not to say that we shouldn't think about those cases, just that I think methods that work with precomputed impacts are really fast and useful, and they solve the select() part of the problem here (though not the append).

ChrisFairless commented 4 years ago

sorry, didn't mean to close the issue. comment-close is basically a mic drop for GitHub ✋ 🎤

chahank commented 4 years ago

So, basically you propose to extract the imp_mat and work with this exclusively? It is of course a way to go, but kind of beats the purpose of the code no? We have this nice impact object, with attached plot function etc. Is it really sensible to not use that and do everything manually?

ChrisFairless commented 4 years ago

the difficulty then is, say i have 100 sets of weightings i want to calculate new statistics for. how should we do that? for each set of weightings, we could run the calculation and

i think the last is the easiest to implement and the most flexible. i don't think it makes plotting harder either since it won't be expensive to add a weight parameter to these methods and call them when you need them. it's probably cheaper and programatically better than reweighting and recalculating imp_mat before plotting results every time.

sameberenz commented 4 years ago

So what you are proposing is adding a weight parameter (array with one weight per event) to the Impact instance and apply the weight within the methods that compute return periods, EAI, or make plots, etc.?

ChrisFairless commented 4 years ago

i don't know if we'd make weight part of the Impact object or not, probably not if we allow for multiple calculations with different weights. we could also do this by directly modifying frequency, but i can see possible problems with that.

this is one of several possible solutions, though, and other people know the model better than me.

chahank commented 4 years ago

@sameberenz Have you come up with a preferred solution to your problem?

sameberenz commented 4 years ago

For my current project, I am currently inclined towards a lean local solution where I save weights mapped to event IDs seperately (in a dictionary or dataframe) and refer to the weights when post-processing. On the long run, an integrated weight that can be considered by plotting and statistics-methods of the Impact-class would be my go-to solution.

chahank commented 3 years ago

This was addressed in PR #93 . @sameberenz is the implemented solution satisfying or do you see further improvement possibilities? If not, could you please close this issue?