CLIMADA-project / climada_python

Python (3.8+) version of CLIMADA
GNU General Public License v3.0
321 stars 125 forks source link

haz.select() doesn't work with empty haz.fraction matrix #838

Open elianekobler opened 10 months ago

elianekobler commented 10 months ago

What I wanted to do: haz_surge = haz_surge.select(reg_id=1)

The select function checkes the region_id matrix AND the fraction matrix. Because my fraction matrix was empty, I got an index out of range error. The workaround was to fill the fraction matrix with intensity values, which is not very efficient.

chahank commented 10 months ago

Thanks for noting this! Since we now allow the fraction matrix to be empty, the select method should be adapted to take this into account. This should be an easy fix (just check if fraction is empty before slicing).

chahank commented 10 months ago

@samluethi : could you maybe implement a quick fix?

peanutfun commented 10 months ago

For reference, the issue occurs here: https://github.com/CLIMADA-project/climada_python/blob/e41222210bfb99e92829c0a86f1db2f31355abf5/climada/hazard/base.py#L1391

Any attribute (var_val) that is a CSR matrix will be sliced. If the fraction is empty, the slicing fails.

luseverin commented 8 months ago

I did not manage to reproduce the error, as the slice works even when an empty fraction matrix is set. Could it be that the error stems from the surge module directly? For instance, it could be modifying the shape of the fraction matrix which might trigger the index out of range error.

luseverin commented 8 months ago

@tovogt Could it be that the GeoClaw produces an erreoneous fraction matrix?

tovogt commented 8 months ago

I don't think so. Can you produce a complete code example that demonstrates this issue?

tovogt commented 8 months ago

I did not manage to reproduce the error, as the slice works even when an empty fraction matrix is set.

There is no slicing, but explicit indexing going on in that line. The code will fail with an empty fraction matrix, unless sel_cen and sel_ev are both empty.

luseverin commented 8 months ago

There is no slicing, but explicit indexing going on in that line. The code will fail with an empty fraction matrix, unless sel_cen and sel_ev are both empty.

Ok so maybe I am not exactly clear with what an empty matrix means in that context. This is what I tried to trigger the error:

import scipy.sparse as sp

client = Client()
#first select some arbitrary hazard from the data api
properties={
    'gcm': 'CNRM-CM6-1-HR',
    'climate_scenario': 'ssp585'
}
hazard_WS = client.get_hazard('storm_europe', properties=properties) 

hazard_WS.fraction = sp.csr_matrix(hazard_WS.intensity.shape) #replace fraction matrix by an empty matrix same size as intensity
print("Non-zero elements intensity matrix: " +str(hazard_WS.intensity.nnz) + "\nNon-zero elements fraction matrix: " +str(hazard_WS.fraction.nnz)) #check the number of non-zero elements
haz_sel = hazard_WS.select(reg_id=756) #select e.g. CH

haz_sel.plot_intensity(0, smooth=False)

Basically importing an arbitrary hazard from the data api and setting its fraction matrix to an empty matrix, with the same shape as the intensity matrix. The haz.selectseems to be working fine in that case. Are you meaning something else with "empty fraction matrix"?

tovogt commented 8 months ago

sp.csr_matrix(hazard_WS.intensity.shape) #replace fraction matrix by an empty matrix same size as intensity

This is an all-zero matrix. I assumed that an empty matrix would be sp.csr_matrix((0, 0)).

In any case, we need a complete code example that demonstrates the issue, to analyze it further.

peanutfun commented 7 months ago

This might indeed be a problem of the storm surge module. When using the Hazard.__init__ correctly, I cannot replicate the error.

@chahank @emanuel-schmid While looking into this I found an issue with the "fraction is None"-thing. When initializing with Hazard(fraction=None), the fraction matrix is set to a matrix of zeros with the shape of the intensity matrix. Later in Hazard._get_fraction, a fraction matrix with only zero entries is considered nonexistent. However, this is error prone if you consider selection: If you subselect a larger Hazard with a very sparse fraction, it might happen that you end up with a selection whose fraction is only zeros, and then the Hazard incorrectly assumes that the fraction should not be applied.

import numpy as np
from scipy.sparse import csr_matrix
from climada.hazard import Hazard, Centroids

centroids = Centroids(lat=np.zeros(2), lon=np.zeros(2), region_id=np.array([0, 1]))

hazard = Hazard(
    centroids=centroids,
    event_id=np.array([1, 2]),
    intensity=csr_matrix([[1, 1], [1, 1]]),
    fraction=csr_matrix([[0, 0], [1, 1]]),
)

hazard._get_fraction().toarray()
# array([[0, 0],
#        [1, 1]])

hazard.select(reg_id=[0])._get_fraction().toarray()
# array([[0],
#        [1]])

hazard.select(event_id=[1])._get_fraction()
# Returns None, equivalent to fraction of 1 everywhere, but should be
# array([[0, 0]])

Why did we abandon the "empty fraction"?

chahank commented 7 months ago

Thanks for looking in to this! I however do not understand the point.

a) we never had empty fraction b) the use-case of a selection with a fraction that is truly zero everywhere should imho be handled in the selection method because this is then not a valid hazard I would say. A hazard that cannot make an impact anywhere should not exist. Or do you see a use case of such an object? c) having a fraction matrix of empty values the same size as intensity allows to use all methods as is without having to introduce checks on whether the fraction is empty or not.

If there is a simpler solution happy to hear!

peanutfun commented 5 months ago

The original error seems to only occur with the StormSurge subhazard, where the fraction matrix is defined differently. We might shift this issue over to Petals.