Open mncosta opened 2 months ago
Dear Miguel,
CLIMADA is originally built to be able to handle large amounts of events (think of large probabilistic sets). Thus, slicing over events is important in that setting. Only recently have applications with a focus on singular large events emerged for which a centroid slicing might be more important.
Hi,
Not sure if this belongs here, but I have a question about Climada's performance and either I am missing something or I believe an improving could be made to the use of hazard intensities. Searching through the repositories' issues I only found a reference to this on #874, stating
A significant amount of time is spent slicing the hazard intensity matrix
, which is related to what I found.As I found, currently
Hazard.intensity
is stored as a "row vector"spicy.sparse
matrix in CSR format. I'm making this assumption from the provided example in here, specifically the following code section in Part 1:Inspecting
haz_ven.intensity
yields the following format:Yet, this appears to be a 1 row x 1032226 columns matrix, in essence making the CSR format not the most efficient when slicing/indexing is needed. Why are hazard intensities saved in CSR format and not CSC format? In essence this would making slicing much more efficient. Time-wise would be a massive improvement if hazard matrices are huge. Again, with the above hazard example and with a simple indexing in CSR vs. CSC format:
Which yields:
This is ~12 times faster for this very small problem. In the "real" problem I'm currently working I can get even a bigger speed up with a matrix of shape 1x1072057600, which takes about ~4.5s to index ~20k cells, whereas with CSC format I can slice in about 400 µs.
Thus, why are intensities stored in this manner? Do you require them to be this way for other calculations? Is there a faster way to slice intensities given an array of centroids that I'm missing?