Enchufa2 / r-flexiblas

FlexiBLAS API Interface for R
GNU Lesser General Public License v3.0
13 stars 1 forks source link

flexiblas_set_num_threads() ignored by OpenBLAS #5

Open go-ski opened 7 months ago

go-ski commented 7 months ago

I am having issues (on an M3 Mac and on RHEL 8.8) controlling thread use by OpenMP-built OpenBLAS. Flexiblas thinks threads were set but a matrix computation still acts as if all cores are used, and after the computation flexiblas_get_num_threads() too is reset to all cores.

I am having similar issues when using the GitHub R package wrathematics/openblasctl on the OpenBLAS library directly, so I wonder if this issue is due to some changes in OpenMP-built OpenBLAS that would require a FlexiBLAS library fix.

Below is the R session from the M3 Mac:

> library(flexiblas)
> flexiblas_avail()
[1] TRUE
> flexiblas_current_backend()
[1] "NETLIB"
>  x = matrix(runif(1e7), nrow = 1e4)
> system.time(t(x) %*% x)
   user  system elapsed 
  1.102   0.009   1.113 
> flexiblas_list()
[1] "NETLIB"         "__FALLBACK__"   "APPLE"          "OPENBLASOPENMP"
> flexiblas_switch(flexiblas_load_backend("OPENBLASOPENMP"))
> flexiblas_current_backend()
[1] "OPENBLASOPENMP"
> flexiblas_get_num_threads()
[1] 12
> system.time(t(x) %*% x)
   user  system elapsed 
  1.278   0.019   0.138 
> flexiblas_set_num_threads(1)
> flexiblas_get_num_threads()
[1] 1
> system.time(t(x) %*% x)
   user  system elapsed 
  1.231   0.018   0.133 
> flexiblas_get_num_threads()
[1] 12
> 
> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3.1

Matrix products: default
BLAS:   FlexiBLAS OPENBLASOPENMP 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] flexiblas_3.4.0.2

loaded via a namespace (and not attached):
[1] compiler_4.3.2

And the R session on RHEL 8.8 (NCSA Delta cluster at U Illinois, login node):

[gostrouc@dt-login04 ~]$ module list

Currently Loaded Modules:
  1) gcc/11.4.0               9) flexiblas/3.3.0   17) libxcb/1.14
  2) openmpi/4.1.6           10) inputproto/2.3.2  18) xextproto/7.3.0
  3) cuda/11.8.0             11) kbproto/1.0.7     19) xtrans/1.4.0
  4) cue-login-env/1.0       12) xproto/7.0.31     20) libice/1.1.1
  5) slurm-env/0.1           13) libxau/1.0.8      21) libsm/1.2.4
  6) default-s11             14) libmd/1.0.4       22) libxt/1.3.0
  7) gcc-runtime/11.4.0      15) libbsd/0.11.7     23) libx11/1.8.4
  8) openblas/0.3.26+openmp  16) libxdmcp/1.1.4    24) r_flexiblas/4.3.3

[gostrouc@dt-login04 ~]$ Rscript -e "library(flexiblas); flexiblas_avail()"
[1] FALSE
[gostrouc@dt-login04 ~]$ export LD_PRELOAD=$FLEXIBLAS_HOME/lib64/libflexiblas.so
[gostrouc@dt-login04 ~]$ Rscript -e "library(flexiblas); flexiblas_avail()"
[1] TRUE
[gostrouc@dt-login04 ~]$ R

R version 4.3.3 (2024-02-29) -- "Angel Food Cake"
Copyright (C) 2024 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(flexiblas)
> flexiblas_list()
[1] "NETLIB"       "__FALLBACK__" "OPENBLAS"     "OPENBLASOMP" 
> x = matrix(runif(1e8), nrow = 1e5)
> system.time(t(x) %*% x)
   user  system elapsed 
 30.168   0.353  30.758 
> flexiblas_current_backend()
[1] "NETLIB"
> flexiblas_switch(flexiblas_load_backend("OPENBLASOMP"))
> flexiblas_current_backend()
[1] "OPENBLASOMP"
> flexiblas_get_num_threads()
[1] 128
> system.time(t(x) %*% x)
   user  system elapsed 
 32.947  30.015   4.488 
> system.time(t(x) %*% x)
   user  system elapsed 
 32.065  24.159   4.035 
> flexiblas_set_num_threads(1)
> flexiblas_get_num_threads()
[1] 1
> system.time(t(x) %*% x)
   user  system elapsed 
 28.900  23.665   3.820 
> system.time(t(x) %*% x)
   user  system elapsed 
 25.908  50.175   5.421 
> flexiblas_get_num_threads()
[1] 128
> sessionInfo()
R version 4.3.3 (2024-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux 8.8 (Ootpa)

Matrix products: default
BLAS/LAPACK: FlexiBLAS OPENBLASOMP;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: America/Chicago
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] flexiblas_3.4.0

loaded via a namespace (and not attached):
[1] compiler_4.3.3 tools_4.3.3   
> q()
Save workspace image? [y/n/c]: n
[gostrouc@dt-login04 ~]$ flexiblas --version
FlexiBLAS version 3.3.0
[gostrouc@dt-login04 ~]$ module list

Currently Loaded Modules:
  1) gcc/11.4.0               9) flexiblas/3.3.0   17) libxcb/1.14
  2) openmpi/4.1.6           10) inputproto/2.3.2  18) xextproto/7.3.0
  3) cuda/11.8.0             11) kbproto/1.0.7     19) xtrans/1.4.0
  4) cue-login-env/1.0       12) xproto/7.0.31     20) libice/1.1.1
  5) slurm-env/0.1           13) libxau/1.0.8      21) libsm/1.2.4
  6) default-s11             14) libmd/1.0.4       22) libxt/1.3.0
  7) gcc-runtime/11.4.0      15) libbsd/0.11.7     23) libx11/1.8.4
  8) openblas/0.3.26+openmp  16) libxdmcp/1.1.4    24) r_flexiblas/4.3.3

[gostrouc@dt-login04 ~]$ export | grep BLAS
declare -x FLEXIBLAS_HOME="/sw/spack/deltas11-2023-03/apps/linux-rhel8-zen3/gcc-11.4.0/flexiblas-3.3.0-6rhymkk"
declare -x OPENBLAS_HOME="/sw/spack/deltas11-2023-03/apps/linux-rhel8-zen3/gcc-11.4.0/openblas-0.3.26-27qkoyp"
[gostrouc@dt-login04 ~]$ 
Enchufa2 commented 7 months ago

I think you should set a bigger problem to draw conclusions from the timings, because that operation takes too little time, and the overhead of threading impacts your benchmark.

But anyway, the last call to flexiblas_get_num_threads() should report 1, so obviously there's some issue here. I cannot reproduce this on my machine. I'll report upstream.

Enchufa2 commented 7 months ago

We would need more details though. What's the version of OpenBLAS? Is this the build provided by the distro or did you compile it yourself? If so, could you please report the configuration, flags, etc.? And the same for FlexiBLAS. :)

go-ski commented 7 months ago

I have edited the RHEL 8.8 Delta example above with 10x bigger matrix and added versions. It is not a distro build as Delta is provisioned with spack and I don't have the config details. The fact that you cannot reproduce it (and that I used LD_PRELOAD on Delta) tells me that the issue could be with my setup. On the Mac, I upgraded macOS 14.3 to 14.4 this morning and it broke my FlexiBLAS builds with "no LC_RPATH's found" so let's put this on hold until I have a better reprex.