Open maleadt opened 3 years ago
@staticfloat I suppose this is going to be the same as MKL. Forwarding 64_ suffixed BLAS functions to the non-suffixed ones.
@maleadt Are any init/threading NVBLAS specific APIs that need calling. Those will be needed to added here like we did for MKL in #19
No specific APIs to call. One problem is that this BLAS only supports a limited number of functions, and forwards to another blas itself (configurable via environment variables and a configuration file):
000000000000bfb0 g DF .text 0000000000000282 libnvblas.so.11 chemm_
000000000000ca10 g DF .text 0000000000000282 libnvblas.so.11 csyr2k_
0000000000009670 g DF .text 00000000000002bd libnvblas.so.11 cgemm_
00000000000090f0 g DF .text 00000000000002bd libnvblas.so.11 sgemm_
000000000000cf70 g DF .text 0000000000000282 libnvblas.so.11 cher2k_
000000000000afb0 g DF .text 000000000000029c libnvblas.so.11 ctrsm_
000000000000aa70 g DF .text 000000000000029c libnvblas.so.11 strsm_
000000000000a320 g DF .text 0000000000000250 libnvblas.so.11 zsyrk_
0000000000009e80 g DF .text 0000000000000250 libnvblas.so.11 dsyrk_
000000000000c240 g DF .text 0000000000000282 libnvblas.so.11 zhemm_
0000000000009930 g DF .text 00000000000002bd libnvblas.so.11 zgemm_
000000000000c4f0 g DF .text 0000000000000282 libnvblas.so.11 ssyr2k_
000000000000a5b0 g DF .text 0000000000000250 libnvblas.so.11 cherk_
00000000000093b0 g DF .text 00000000000002bd libnvblas.so.11 dgemm_
000000000000b250 g DF .text 000000000000029c libnvblas.so.11 ztrsm_
000000000000ad10 g DF .text 000000000000029c libnvblas.so.11 dtrsm_
000000000000b530 g DF .text 0000000000000282 libnvblas.so.11 ssymm_
000000000000ba50 g DF .text 0000000000000282 libnvblas.so.11 csymm_
000000000000da10 g DF .text 00000000000002ac libnvblas.so.11 ctrmm_
000000000000d4b0 g DF .text 00000000000002ac libnvblas.so.11 strmm_
000000000000c780 g DF .text 0000000000000282 libnvblas.so.11 dsyr2k_
000000000000a800 g DF .text 0000000000000250 libnvblas.so.11 zherk_
000000000000bce0 g DF .text 0000000000000282 libnvblas.so.11 zsymm_
000000000000dcc0 g DF .text 00000000000002ac libnvblas.so.11 ztrmm_
000000000000b7c0 g DF .text 0000000000000282 libnvblas.so.11 dsymm_
000000000000d760 g DF .text 00000000000002ac libnvblas.so.11 dtrmm_
000000000000a0d0 g DF .text 0000000000000250 libnvblas.so.11 csyrk_
000000000000cca0 g DF .text 0000000000000282 libnvblas.so.11 zsyr2k_
0000000000009c30 g DF .text 0000000000000250 libnvblas.so.11 ssyrk_
000000000000d200 g DF .text 0000000000000282 libnvblas.so.11 zher2k_
This breaks autodetection. Adding some symbol to the list works for suffix detection, but for interface detection that doesn't scale.
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf'
[NVBLAS] Cannot open default config file 'nvblas.conf'
[NVBLAS] Config parsed
[NVBLAS] CPU Blas library need to be provided
We can make nvblas.conf or the env variable point to the Julia provided openblas.
@maleadt - Ideally something like this is what we need to try out NVBLAS: https://github.com/JuliaLinearAlgebra/MKL.jl/blob/master/src/MKL.jl#L38
Of course, we'll then find things that don't quite work and perhaps LBT may need to be taught about NVBLAS. I suppose CUDA_jll does not include LAPACK.
I suppose CUDA_jll does not include LAPACK.
Not a drop-in version like NVBLAS at least.
Might be interesting to experiment with NVBLAS: https://docs.nvidia.com/cuda/nvblas/index.html
Part of CUDA_jll: https://github.com/JuliaBinaryWrappers/CUDA_jll.jl/blob/44445f650547dd14db177336e488460e56d4f354/src/wrappers/x86_64-linux-gnu.jl#L164-L168