Several redundant functions for fitting exceedance freq to return periods

ValentinGebhart commented 5 days ago

Is your feature request related to a problem? Please describe. There are several functions that do the same computation (fitting the relation between exceedance frequency and return periods) but in different interpolation ways. They might be worth to combine. The ones I can find now are:

Impact.calc_freq_curve impact exceedance frequency curve (aggregated over centroids) method: np.interp(freq_cum, imp)
Impact.local_exceedance_imp using loc_return_impusing _cen_return_imp impact exceedance frequency per centroid for several return periods method: np.polyfit( np.log(freq_cum), imp, deg = 1)
Hazard.local_exceedance_inten using _loc_return_inten using _cen_return_inten hazard exceedance frequency per centroid for several return periods method: np.polyfit( np.log(freq_cum), haz, deg = 1)
Hazard.local_return_period using _loc_return_period return period per centroid for several threshold intensities method: np.searchsorted() (i.e. fitting a step function between haz and freq_cum)

Describe the solution you'd like We could write one or two flexible functions that do the computation for all above cases, and maybe some wrapper functions.

Describe alternatives you've considered None Additional context related to issue #209

peanutfun commented 5 days ago

We can adapt the Impact.calc_freq_curve to become a "kernel" for exceedance frequency curves that can be applied to Impact.at_event, an impact time series of a specific exposure point, or a hazard intensity time series of a specific centroid. This function can then be applied to the appropriate data in suitable wrapper functions. It can even be adapted to "flip" the axes, and return the return period for specific values of the time series (i.e., the inverse problem that it currently solves)

chahank commented 3 days ago

There is also a reason for having two methods. One is just an ordering of the values with a linear interpolation between these values Impact.calc_freq_curve. The other is a fit of a curve due to the potentially very small number of values Impact.local_exceedance_imp. Thus, I think that there remains a good reason to have two methods. I would make maybe one single method as a util function that can either interpolate or fit. Then, as @peanutfun suggested, one can use it in the classes and flip the axes if needed.

bguillod commented 1 day ago

I personally find the return period calculations rather obscure to the user, so I would welcome improvements.

Personally, in addition to the above points, I would consider three options and leave it to the user to specify which one (with a good default of course):

The interpolated case, albeit with a clear definition in which space (e.g. log-log) it is performed (or the space being an additional optional argument).
The non-interpolated case.
The fitting of an extreme value distribution.

In my view the default should probably be (1) or (3).

In addition, I would suggest the following:

If the user asks for a return period beyond the maximum one available in cases 1 and 2 above, a warning is raised and np.nan is returned, rather than returning the maximum value silently. Perhaps also in case (3) if the return period is way beyond the max available empirically, although extreme value distributions are designed for extrapolation...
The uncertainty of the estimate should ideally be optionally provided.
In case of multiple identical values, these should be grouped and their frequencies summed first.

Some of these points we've implemented on our end already, so if you undertake a PR, happy to contribute.

bguillod commented 1 day ago

(Just to be clear: when I write three options I mean as an input parameter to the function(s) rather than separate functions at the top-level code API - which doesn't prevent the options to be split into individual private functions)

CLIMADA-project / climada_python

Several redundant functions for fitting exceedance freq to return periods #904