Speed up similarity calculation for the whole map

pawelbielski commented 4 years ago

As already discussed in #3 in the calculate_series_similarity() the sim_fun() is called 256 * 512 times in a for loop, and all the runs are independent from each other. Even though the sim_fun() is external function and cannot be easily vectorized it might be possible to speed up the code by:

using parallelization included in numpy (np.apply() or similar)
Multiprocessing (e.g. Joblib, or multiprocessing)
other approaches...

While multiprocessing speed up is related to the number of CPUs on the machine, it may still give meaningful speedups from around 4 (for PC) up to 10 (computing server), or even 20+ (performance computing server).

This issue is a place to discuss different possible approaches.

pierretoussing commented 4 years ago

Using joblib Parallel() led to a speed up of 19.5% compared to np.apply_along_axis on a System with 4 cores.

pawelbielski commented 4 years ago

@pierretoussing From my experience, joblib works well if the chunks to parallelize are big enough due to the multiprocessing overhead. Do you think that looping over latitudes in multiprocessing fashion, and over longitudes in iterative fashion could bring some improvement?

pierretoussing commented 4 years ago

@pawelbielski Thank you for the suggestion! Implementing

looping over latitudes in multiprocessing fashion, and over longitudes in iterative fashion

brought a speed up of 73,7% on a system with 8 cores!

pawelbielski commented 4 years ago

That is what I am talking about! Great job!

Climate-Data-Science / Climate-Similarity-Metrics

Speed up similarity calculation for the whole map #11