Closed pawelbielski closed 4 years ago
Using joblib Parallel()
led to a speed up of 19.5% compared to np.apply_along_axis
on a System with 4 cores.
@pierretoussing From my experience, joblib works well if the chunks to parallelize are big enough due to the multiprocessing overhead. Do you think that looping over latitudes in multiprocessing fashion, and over longitudes in iterative fashion could bring some improvement?
@pawelbielski Thank you for the suggestion! Implementing
looping over latitudes in multiprocessing fashion, and over longitudes in iterative fashion
brought a speed up of 73,7% on a system with 8 cores!
That is what I am talking about! Great job!
As already discussed in #3 in the
calculate_series_similarity()
thesim_fun()
is called 256 * 512 times in a for loop, and all the runs are independent from each other. Even though thesim_fun()
is external function and cannot be easily vectorized it might be possible to speed up the code by:using parallelization included in numpy (
np.apply()
or similar)Multiprocessing (e.g. Joblib, or multiprocessing)
other approaches...
While multiprocessing speed up is related to the number of CPUs on the machine, it may still give meaningful speedups from around 4 (for PC) up to 10 (computing server), or even 20+ (performance computing server).
This issue is a place to discuss different possible approaches.