NCAR / spack-gust

Spack production user software stack on the Gust test system
4 stars 0 forks source link

CUDA Libs present in base CUDA on Casper, missing on Gust common deployent #5

Closed roryck closed 2 years ago

roryck commented 2 years ago

On casper the base CUDA lib dir

/glade/u/apps/dav/opt/cuda/11.0.3/lib64 

contains some numerical libraries that are missing from what appears to be the equivalent directory on Gust

/glade/u/apps/common/22.08/spack/opt/spack/nvhpc/22.7/gcc/7.5.0/Linux_x86_64/22.7/cuda/lib64

The missing libs that I've hit so far are:

libcublas.so.11
libcublasLt.so.11
libcufft.so.10
libcurand.so.10
libcusolver.so.11
libcusparse.so.11
vanderwb commented 2 years ago

Yeah, the nvhpc install doesn't put them in there, but instead puts them into:

/glade/u/apps/common/22.08/spack/opt/spack/nvhpc/22.7/gcc/7.5.0/Linux_x86_64/22.7/math_libs/11.7/targets/x86_64-linux/lib/libcublas.so.11

You should see the same for newer CUDAs on Casper too (like 11.6)... but I'll double-check.

I think on Casper I add the math_libs stuff to the NCAR wrapper paths and the LD_LIBRARY_PATH in the CUDA module - does that work for your purposes?

roryck commented 2 years ago

Yes, if it gets added to the LD_LIBRARY_PATH in the CUDA module I believe it will get picked up by tensorflow, which is where this is failing for me.

vanderwb commented 2 years ago

We will do dedicated CUDA installs via Spack to resolve this issue from now on (rather than pulling the semi-broken CUDA from the nvhpc install).