CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
291 stars 115 forks source link

Add functions to compute, plot, store the local hazard exceedence intensity and RP maps (new branch) #898

Open ValentinGebhart opened 1 week ago

ValentinGebhart commented 1 week ago

Changes proposed in this PR (new version of old PR #857):

Earlier parts removed (commented out in code) until agreement:

This PR fixes #854 (new version of old PR #857)

PR Author Checklist

PR Reviewer Checklist

peanutfun commented 1 week ago

I am not sure it makes sense that the plotting and saving methods always call and recalculate the values. Maybe we should rather store the values as a new attribute. Do you know how much computation this costs for large datasets? @emanuel-schmid @peanutfun : what do you think, would it be ok if the method local_return_period adds an attribute to the hazard object?

I agree that the values should not be calculated in the plotting method. Instead, the user should insert the data to plot into the appropriate function.

I very much dislike adding new attributes after the class initialization. I would return the data to the user, who should take care of "bookkeeping" themselves. They can then plug the values into a plotting function, store them separately, etc.

chahank commented 1 week ago

I agree that the values should not be calculated in the plotting method. Instead, the user should insert the data to plot into the appropriate function.

I very much dislike adding new attributes after the class initialization. I would return the data to the user, who should take care of "bookkeeping" themselves. They can then plug the values into a plotting function, store them separately, etc.

I also agree with you that adding new attributes after the class initialization is a bad habit. Here the point is that plotting the intensity return period on the map requires the centroids, and thus having it in the hazard object is convenient... not sure what the best solution is.

ValentinGebhart commented 1 week ago

I very much dislike adding new attributes after the class initialization. I would return the data to the user, who should take care of "bookkeeping" themselves. They can then plug the values into a plotting function, store them separately, etc.

I also agree with you that adding new attributes after the class initialization is a bad habit. Here the point is that plotting the intensity return period on the map requires the centroids, and thus having it in the hazard object is convenient... not sure what the best solution is.

One way of returning data to the user without losing the centroids could be to return a geodataframe?

peanutfun commented 1 week ago

@ValentinGebhart Great idea! Returning a GeoDataFrame with the return period information and the centroids geometry should contain all data necessary for the plot.

Users can then use the geopandas functions for plotting and writing. Maybe this solves all issues? πŸ˜…

emanuel-schmid commented 1 week ago

@ValentinGebhart I agree the idea is not bad - but returning a GeoDataFrame is still somewhat fuzzy. A cleaner solution would be to return a tuple (intensity return period, centroids geometry) or a class with two attributes, intensity_return_period and geometry. This seems cleaner. Ideally any requirement should be explicit in a method's signature and not hidden away in an object that could contain anything at all, like a dict or a dataframe.

chahank commented 1 week ago

@emanuel-schmid I strongly disagree with this take as is. The proposed outputs (tuples or custom class) maybe would be cleaner from a programming point of view, but it would make the output mostly useless from a user perspective. I do not see the problem with giving back a geodataframe. Can you maybe explain why this would be a bad idea?

Besides, we already have a precedent with the method impact_at_reg that returns a DataFrame :D

emanuel-schmid commented 1 week ago

Yeah. Agreed. Although I must stress that the principle holds - in this particular case a dataframe seems alrightish. 😁

chahank commented 1 week ago

Yeah. Agreed. Although I must stress that the principle holds - in this particular case a dataframe seems alrightish. 😁

Indeed. I think there is a general question on how climada objects should return extra information. We had the questions about Centroids and things like pixel sizes, Impacts and regional values, or here Hazards and return period.

ValentinGebhart commented 4 days ago

In the new commit, local_return_period outputs a gdf and I add a function subplots_from_gdf to util.plot that plots subplots from the different columns of a gdf. I thought it would be nice to have this plot function as general as possible (so it could plot return periods and exceedance frequencies and more), but if this is too messy, we can also write a more specific "plot_return_period" function.

I was not sure how to best include the meta data about what is written in the gdf and the corresponding units in the gdf. We decided to try the gdf.columns.name attribute, but I feel that how I did it now might not be the best way. To be discussed.

I also removed the write functions (only commented out for now) because we might want to generalize them to general gdf.