csgillespie / benchmarkme

Crowd sourced benchmarking
https://csgillespie.github.io/benchmarkme/
40 stars 13 forks source link

Strange timing results #38

Closed wadudmiah closed 3 years ago

wadudmiah commented 3 years ago

Hi, I am getting very strange results for 2 and 4 processes. The 4 process run is taking slightly longer than the 2 process run:

inverse FFT eigen cholesky
3.71 0.801 1.22 28.84
4.008 0.854 1.256 29.33

The R command I used to run the benchmark is res_io = benchmark_std(runs = 3, cores = 4). It doesn't look like that this benchmark is multi-threaded.

csgillespie commented 3 years ago

How many cores do you have? What's your laptop spec?

drkrynstrng commented 3 years ago

I'm encountering the same issue, but I think we're misinterpreting what the parallel benchmarks do. I originally thought the parallel benchmarks were for a multi-threaded BLAS library (if installed) but this does not appear to be the case. Instead, looking at the source code and docs, they assess the impact of the overhead from using parallel::makeCluster() and run the same benchmark on each core simultaneously. So at some point we should expect worse performance as the number of cores increases, because of the overhead of setting up the cluster. Is that correct?

To assess the impact of using a multi-threaded BLAS library (OpenBLAS on Linux in my case), I can see some performance improvements (less so for FFT) by setting and varying the OPENBLAS_NUM_THREADS or OMP_NUM_THREADS environment variable in the shell before starting a new R session and running the benchmarks in serial mode (default cores=0L).

csgillespie commented 3 years ago

Instead, looking at the source code and docs, they assess the impact of the overhead from using parallel::makeCluster() and run the same benchmark on each core simultaneously. So at some point we should expect worse performance as the number of cores increases, because of the overhead of setting up the cluster. Is that correct?

Yes that's correct.