alexeckert / parallelDist

R Package: Parallel Distance Matrix Computation using Multiple Threads
GNU General Public License v2.0
49 stars 9 forks source link

Multi-threading only uses 1 thread #20

Closed altintasali closed 3 years ago

altintasali commented 3 years ago

Dear @alexeckert,

First of all, thank you so much for this amazing package. I have implemented parallelDist in most of my workflows.

While running parDist in both Ubuntu and MacOS, I have noticed that it uses only 1 thread although I set it to multiple threads. Therefore I decided to run a quick benchmark on the CPU times.

library(parallelDist)
library(microbenchmarkCore)

sample.matrix <- matrix(c(1:100000), ncol = 10)

microbenchmarkCore::microbenchmark(
  "threads_1" = {
    dist.euclidean <- parDist(sample.matrix, method = "euclidean", threads = 1)  
  },
  "threads_4" = {
    dist.euclidean <- parDist(sample.matrix, method = "euclidean", threads = 4)  
  },
  "threads_8" = {
    dist.euclidean <- parDist(sample.matrix, method = "euclidean", threads = 8)  
  },
  times = 100,
  unit = "ms"
)

And here are the outputs from MacOS (2.8 GHz Quad-Core Intel Core i7)

Unit: milliseconds
      expr      min       lq     mean   median       uq      max neval
 threads_1 895.4964 975.5668 1073.367 1014.960 1062.621 2322.394   100
 threads_4 883.6357 971.3129 1138.207 1041.920 1182.696 3318.494   100
 threads_8 902.9203 975.4904 1095.014 1042.772 1129.013 2300.297   100

and Ubuntu (Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz)

Unit: milliseconds
      expr      min       lq     mean   median       uq      max neval
 threads_1 3711.082 3719.754 3723.159 3722.500 3724.361 3800.326   100
 threads_4 3713.285 3719.329 3721.863 3721.259 3723.983 3751.224   100
 threads_8 3710.879 3720.014 3730.472 3722.441 3726.025 4292.627   100

I have seen that you have mentioned "Intel TBB lib" is need for multi-threading in this post. Therefore, I made sure that I have it.

As the results state, there is no run time difference across different threads. May I ask for your kind help to guide me through this issue? Even single threaded operations are way faster than the other distance functions in R. Therefore, I am so excited how the multi-threading can improve these run times.

In case that it might help, I am dropping my sessionInfo() information for both MacOS and Ubuntu.

MacOS

R version 3.6.2 (2019-12-12)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmarkCore_1.0-0.4 parallelDist_0.2.4        

loaded via a namespace (and not attached):
[1] compiler_3.6.2     tools_3.6.2        Rcpp_1.0.5         RcppParallel_5.0.2

Ubuntu

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] microbenchmarkCore_1.0-0.4 parallelDist_0.2.4        

loaded via a namespace (and not attached):
[1] compiler_3.6.3     tools_3.6.3        Rcpp_1.0.5         RcppParallel_5.0.2
alexeckert commented 3 years ago

Hello @altintasali,

thanks for your detailed report and tests, I could reproduce the error on Linux and Windows.

It seems like once the thread number is set in a R session it stays fixed, but when starting a new R session the number of threads can be set again (in my case I couldn't set it back to a single thread and multiple threads were used during the session). I'm not sure if this behavior is caused by a dependency update, maybe there is another way than RcppParallel::setThreadOptions which allows to change this value dynamically during a session.

alexeckert commented 3 years ago

Seems to be the same issue as mentioned here: https://github.com/RcppCore/RcppParallel/issues/110#issuecomment-699622188

altintasali commented 3 years ago

Hi @alexeckert,

Thanks for your input. I have also tried some other ways, but my attempts were unsuccessful. It seems like we need to wait for the next release of the RccpParallel. Hope it will be fixed properly.

kevinushey commented 3 years ago

RcppParallel 5.1.2 was just released to CRAN -- please let me know if you're still having any issues.

alexeckert commented 3 years ago

@kevinushey Thank you very much for your work. I've run some tests on windows and ubuntu and threads can be adjusted again without creating a new session. 👍

> library(parallelDist)
> sample.matrix <- matrix(c(1:500000), ncol = 10)
stem.time(parDist(x = sample.matrix, method = "euclidean", threads = 1))
system.time(parDist(x = sample.matrix, method = "euclidean", threads = 8))> system.time(parDist(x = sample.matrix, method = "euclidean", threads = 1))
   user  system elapsed
 91.619   3.191  94.809
> system.time(parDist(x = sample.matrix, method = "euclidean", threads = 8))
   user  system elapsed
196.357   1.510  26.506
kevinushey commented 3 years ago

Great news; I'm glad to hear it! Sorry for the trouble in the interim.