Open go-ski opened 7 months ago
I think you should set a bigger problem to draw conclusions from the timings, because that operation takes too little time, and the overhead of threading impacts your benchmark.
But anyway, the last call to flexiblas_get_num_threads()
should report 1, so obviously there's some issue here. I cannot reproduce this on my machine. I'll report upstream.
We would need more details though. What's the version of OpenBLAS? Is this the build provided by the distro or did you compile it yourself? If so, could you please report the configuration, flags, etc.? And the same for FlexiBLAS. :)
I have edited the RHEL 8.8 Delta example above with 10x bigger matrix and added versions. It is not a distro build as Delta is provisioned with spack and I don't have the config details. The fact that you cannot reproduce it (and that I used LD_PRELOAD on Delta) tells me that the issue could be with my setup. On the Mac, I upgraded macOS 14.3 to 14.4 this morning and it broke my FlexiBLAS builds with "no LC_RPATH's found" so let's put this on hold until I have a better reprex.
I am having issues (on an M3 Mac and on RHEL 8.8) controlling thread use by OpenMP-built OpenBLAS. Flexiblas thinks threads were set but a matrix computation still acts as if all cores are used, and after the computation
flexiblas_get_num_threads()
too is reset to all cores.I am having similar issues when using the GitHub R package
wrathematics/openblasctl
on the OpenBLAS library directly, so I wonder if this issue is due to some changes in OpenMP-built OpenBLAS that would require a FlexiBLAS library fix.Below is the R session from the M3 Mac:
And the R session on RHEL 8.8 (NCSA Delta cluster at U Illinois, login node):