ICLDisco / dplasma

DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
Other
10 stars 8 forks source link

CUDA build broken with CUDA 12: bungled cublas includes #92

Closed devreal closed 11 months ago

devreal commented 1 year ago

Describe the bug

Nvidia decided to deliberately break their API by mapping cublas v2 symbols to cublas symbols based on the order in which their headers are included. With CUDA 12.1.1 (on xsdk) the build of dplasma fails:

In file included from /apps/spacks/2023-05-19/opt/spack/linux-rocky9-x86_64/gcc-9.5.0/cuda-12.1.1-7lpbukftno3qrmzyonhzq5wbnw7qkxdl/include/cusolverDn.h:86,
                 from /home/jschuchart/src/dplasma/src/potrf_cublas_utils.h:12,
                 from /home/jschuchart/src/dplasma/build/src/zpotrf_wrapper.c:16:
/apps/spacks/2023-05-19/opt/spack/linux-rocky9-x86_64/gcc-9.5.0/cuda-12.1.1-7lpbukftno3qrmzyonhzq5wbnw7qkxdl/include/cublas_v2.h:59:2: error: #error "It is an error to include both cublas.h and cublas_v2.h"
   59 | #error "It is an error to include both cublas.h and cublas_v2.h"
      |  ^~~~~

Nvidia's own cusolverDn.h includes cublas_v2.h but potrf_cublas_utils.h includes cublas.h. I guess DPLASMA must use exclusively the v2 header to avoid this conflict.

To Reproduce

$ module swap cuda/12.1.1
$ cmake 
$ make
QingleiCao commented 1 year ago

This is where the error message comes from in cublas_v2.h:

#if defined(CUBLAS_H_)
#error "It is an error to include both cublas.h and cublas_v2.h"
#endif

Possible solutions could be using cublas_v2 only or #undef CUBLAS_H_ before calling clublas_v2.