CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
291 stars 115 forks source link

Multiple calls to `ImpactCalc` surprisingly slow #874

Open peanutfun opened 2 months ago

peanutfun commented 2 months ago

TLDR: Calling ImpactCalc.impact multiple times and concatenating the result is much slower than calling it once on a larger event set. This has some unfortunate implications for use cases where the exposure or the impact function changes between hazard events.

Problem

In my use case I have a hazard event set of one event per year (flood footprints). I also have one exposure layer per year (WorldPop data). To compute an impact over all years, I must call ImpactCalc independently for each exposure layer, because it does not support multiple exposures. I then concatenate the result into a single impact.

This was my general idea on how to implement this:

# Usual setting (one exposure, multiple hazard events)
impact = ImpactCalc(exposure, impfset, hazard).impact(save_mat=False, assign_centroids=False)

# Multi-exposure (one for each hazard event)
# NOTE: This is conceptually the same work as before, but now with different exposure values
#       for each event.
impact = Impact.concat([
    ImpactCalc(
        exposure_map[event_id],
        impfset,
        hazard.select(event_id=[event_id]
    ).impact(save_mat=False, assign_centroids=False)
    for event_id in hazard.event_id.flat
])

I was quite bummed to see that in this case, impact calculation took longer by a factor of more than 5, compared to an impact calculation with a single exposure and multiple hazard events. Note that a factor 5 might mean 5s vs 25s for a single impact calculation, but 5min vs 25min for a calibration.

Profiling results

I created a stub test case. It does not use real data, and in this case, the slowdown was "only" a factor of 2, see my code below. I made sure that I assigned centroids beforehand, I did not save the impact matrix, and I only profiled the actual impact calculation. Profiling visualized with snakeviz.

Single exposure:

Screenshot 2024-04-19 at 17 28 49

Multi-exposure:

Screenshot 2024-04-19 at 17 29 07

Results:

Discussion points

Code

https://polybox.ethz.ch/index.php/s/y2aqu3JX4w6qgfy

chahank commented 2 months ago

Interesting! Just a few quick feedbacks:

In brief, the current setup was obtained after trying to optimize computation times for a large number of different use cases (small/large exposures, small/large hazards). But there is certainly room for improvement! Maybe one could make the slicing inside of the ImpactCalc optional? Or, actually what I had in mind while writing the module is to add other types of computation methods over time, here something like ImpactCalc.multi_exposures_impact.

peanutfun commented 1 month ago

@chahank Could you supply the Jupyter script and data you used for benchmarking ImpactCalc?

chahank commented 1 month ago

It is a bit messy, so let me clean it up and send it directly.