Hazard.intensity in CSR format and not CSC format

Hi,

Not sure if this belongs here, but I have a question about Climada's performance and either I am missing something or I believe an improving could be made to the use of hazard intensities. Searching through the repositories' issues I only found a reference to this on #874, stating A significant amount of time is spent slicing the hazard intensity matrix, which is related to what I found.

As I found, currently Hazard.intensity is stored as a "row vector" spicy.sparse matrix in CSR format. I'm making this assumption from the provided example in here, specifically the following code section in Part 1:

%matplotlib inline
import numpy as np
from climada.hazard import Hazard
from climada.util.constants import HAZ_DEMO_FL
# to hide the warnings
import warnings
warnings.filterwarnings('ignore')

# read intensity from raster file HAZ_DEMO_FL and set frequency for the contained event
haz_ven = Hazard.from_raster([HAZ_DEMO_FL], attrs={'frequency':np.ones(1)/2}, haz_type='FL')
haz_ven.check()

Inspecting haz_ven.intensityyields the following format:

<1x1032226 sparse matrix of type '<class 'numpy.float32'>'
    with 161671 stored elements in Compressed Sparse Row format>

Yet, this appears to be a 1 row x 1032226 columns matrix, in essence making the CSR format not the most efficient when slicing/indexing is needed. Why are hazard intensities saved in CSR format and not CSC format? In essence this would making slicing much more efficient. Time-wise would be a massive improvement if hazard matrices are huge. Again, with the above hazard example and with a simple indexing in CSR vs. CSC format:

# taking some random indexes to slice
idx = np.random.randint(0, 1032226, 100)

# get hazard.intensity in csc format
csc_format = haz_ven.intensity.tocsc()

# profiling slicing
%timeit csc_format[:, idx]
%timeit haz_ven.intensity[:, idx]

Which yields:

58.3 µs ± 71.5 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
711 µs ± 1.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

This is ~12 times faster for this very small problem. In the "real" problem I'm currently working I can get even a bigger speed up with a matrix of shape 1x1072057600, which takes about ~4.5s to index ~20k cells, whereas with CSC format I can slice in about 400 µs.

Thus, why are intensities stored in this manner? Do you require them to be this way for other calculations? Is there a faster way to slice intensities given an array of centroids that I'm missing?

CLIMADA-project / climada_python

Hazard.intensity in CSR format and not CSC format #938