JuliaLinearAlgebra / libblastrampoline

Using PLT trampolines to provide a BLAS and LAPACK demuxing library.
MIT License
66 stars 17 forks source link

NVBLAS #23

Open maleadt opened 3 years ago

maleadt commented 3 years ago

Might be interesting to experiment with NVBLAS: https://docs.nvidia.com/cuda/nvblas/index.html

The NVBLAS Library is a GPU-accelerated Libary that implements BLAS (Basic Linear Algebra Subprograms). It can accelerate most BLAS Level-3 routines by dynamically routing BLAS calls to one or more NVIDIA GPUs present in the system, when the charateristics of the call make it to speedup on a GPU.

Part of CUDA_jll: https://github.com/JuliaBinaryWrappers/CUDA_jll.jl/blob/44445f650547dd14db177336e488460e56d4f354/src/wrappers/x86_64-linux-gnu.jl#L164-L168

ViralBShah commented 3 years ago

@staticfloat I suppose this is going to be the same as MKL. Forwarding 64_ suffixed BLAS functions to the non-suffixed ones.

@maleadt Are any init/threading NVBLAS specific APIs that need calling. Those will be needed to added here like we did for MKL in #19

maleadt commented 3 years ago

No specific APIs to call. One problem is that this BLAS only supports a limited number of functions, and forwards to another blas itself (configurable via environment variables and a configuration file):

000000000000bfb0 g    DF .text  0000000000000282  libnvblas.so.11 chemm_
000000000000ca10 g    DF .text  0000000000000282  libnvblas.so.11 csyr2k_
0000000000009670 g    DF .text  00000000000002bd  libnvblas.so.11 cgemm_
00000000000090f0 g    DF .text  00000000000002bd  libnvblas.so.11 sgemm_
000000000000cf70 g    DF .text  0000000000000282  libnvblas.so.11 cher2k_
000000000000afb0 g    DF .text  000000000000029c  libnvblas.so.11 ctrsm_
000000000000aa70 g    DF .text  000000000000029c  libnvblas.so.11 strsm_
000000000000a320 g    DF .text  0000000000000250  libnvblas.so.11 zsyrk_
0000000000009e80 g    DF .text  0000000000000250  libnvblas.so.11 dsyrk_
000000000000c240 g    DF .text  0000000000000282  libnvblas.so.11 zhemm_
0000000000009930 g    DF .text  00000000000002bd  libnvblas.so.11 zgemm_
000000000000c4f0 g    DF .text  0000000000000282  libnvblas.so.11 ssyr2k_
000000000000a5b0 g    DF .text  0000000000000250  libnvblas.so.11 cherk_
00000000000093b0 g    DF .text  00000000000002bd  libnvblas.so.11 dgemm_
000000000000b250 g    DF .text  000000000000029c  libnvblas.so.11 ztrsm_
000000000000ad10 g    DF .text  000000000000029c  libnvblas.so.11 dtrsm_
000000000000b530 g    DF .text  0000000000000282  libnvblas.so.11 ssymm_
000000000000ba50 g    DF .text  0000000000000282  libnvblas.so.11 csymm_
000000000000da10 g    DF .text  00000000000002ac  libnvblas.so.11 ctrmm_
000000000000d4b0 g    DF .text  00000000000002ac  libnvblas.so.11 strmm_
000000000000c780 g    DF .text  0000000000000282  libnvblas.so.11 dsyr2k_
000000000000a800 g    DF .text  0000000000000250  libnvblas.so.11 zherk_
000000000000bce0 g    DF .text  0000000000000282  libnvblas.so.11 zsymm_
000000000000dcc0 g    DF .text  00000000000002ac  libnvblas.so.11 ztrmm_
000000000000b7c0 g    DF .text  0000000000000282  libnvblas.so.11 dsymm_
000000000000d760 g    DF .text  00000000000002ac  libnvblas.so.11 dtrmm_
000000000000a0d0 g    DF .text  0000000000000250  libnvblas.so.11 csyrk_
000000000000cca0 g    DF .text  0000000000000282  libnvblas.so.11 zsyr2k_
0000000000009c30 g    DF .text  0000000000000250  libnvblas.so.11 ssyrk_
000000000000d200 g    DF .text  0000000000000282  libnvblas.so.11 zher2k_

This breaks autodetection. Adding some symbol to the list works for suffix detection, but for interface detection that doesn't scale.

[NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf'
[NVBLAS] Cannot open default config file 'nvblas.conf'
[NVBLAS] Config parsed
[NVBLAS] CPU Blas library need to be provided
ViralBShah commented 3 years ago

We can make nvblas.conf or the env variable point to the Julia provided openblas.

ViralBShah commented 3 years ago

@maleadt - Ideally something like this is what we need to try out NVBLAS: https://github.com/JuliaLinearAlgebra/MKL.jl/blob/master/src/MKL.jl#L38

Of course, we'll then find things that don't quite work and perhaps LBT may need to be taught about NVBLAS. I suppose CUDA_jll does not include LAPACK.

maleadt commented 3 years ago

I suppose CUDA_jll does not include LAPACK.

Not a drop-in version like NVBLAS at least.