Closed wviechtb closed 3 years ago
Thanks a lot for reporting this! @wviechtb
I set this environment variable in the rocker/rstudio image we use for RStudio with root user deployment: https://github.com/MaastrichtU-IDS/dsri-openshift-applications/blob/main/templates-anyuid/template-rstudio-root-persistent.yml#L69
- name: OPENBLAS_NUM_THREADS
displayName: Number of threads for OpenBLAS
description: Restricting the number of thread allocated to OpenBLAS can speed up computations using OpenBLAS (leave empty otherwise)
value: ""
required: false
And:
env:
- name: OPENBLAS_NUM_THREADS
value: "${OPENBLAS_NUM_THREADS}"
Could you try it in your project to let me know if it improves the performances for you?
oc apply -f https://raw.githubusercontent.com/MaastrichtU-IDS/dsri-openshift-applications/main/templates-anyuid/template-rstudio-root-persistent.yml
Also feel free to let me know if you have a better description for the parameter! Or a different default value (I was thinking to leave it empty to keep the original behavior by default, but maybe I need to set it to 0?)
Thanks for getting started on this! But OPENBLAS_NUM_THREADS
needs to be set to 1, not blank. When blank, the default is used (which will be 64 on the DSRI nodes).
Yes, but now you can choose the number of threads for OpenBLAS when you start a RStudio app from the template:
And you can set it to 1
I updated the RStudio with root user template in your project
Ah, I see! I would suggest to fill in a 1 by default in the template though.
Ok, I updated the template default value to 1
Tried it out and works as intended. Thanks! I think the issue can be closed now.
The RStudio containers are configured to use OpenBLAS (great!). However, they do not put any restriction on the number of cores / threads that OpenBLAS is allowed to use. This is not so great since using all 64 cores / 128 threads (which will be used by default) usually ends up hampering performance. An example where I create a 200x200 matrix and then take the inverse of this 100 times:
This takes 20+ seconds and all 128 threads are at close to 100% utilization (in other words, the entire node is being saturated).
Now let's restrict the number of threads that OpenBLAS is allowed to use:
This now takes around 0.2 seconds and only a single thread is being used.
The problem with using all cores (implicitly) will be even more magnified if one uses explicit parallelization, since all workers then use all 64 cores and things will slow down to a crawl.
Usually, the number of cores is set via an environmental variable:
This needs to happen before R/RStudio is started. Setting the environmental variable from within R with:
does not work.
So, it would be great if the RStudio containers could be configured to set the environmental variable as described above. Those who need more cores for their matrix algebra stuff (and know what they are doing) can still use the RhpcBLASctl package to adjust the thread number.