Open ocaisa opened 1 year ago
Hello @ocaisa thank you very much for your effort. We had a discussion at Univ of Oslo, with @terjekv and few others. Do you have a summary of restrictions we have when distributing NVidia libraries, specially CUDA runtime. We have a meeting with some top NVIdia people and we can bring this to their attention.
We've already had a discussion with them around this. We have a specific plan here where we parse the EULA to figure out what we can ship, everything else we strip out replacing it by a symlink to a special location. We assume that what is listed in the EULA is sufficient for the runtime (and that seems to be the case so far). For other cases (like when using the CUDA compiler), we have a script that reinstalls CUDA in that special location unbreaking all the symlinks. It might be a little clearer with the PR I hope to make today.
When the symlinks are unbroken, there is no difference to a typical installation (except that the non-runtime parts are actually local)
Some progress here:
software.eessi.io
;software.eessi.io
, in which files that are not in the EULA whitelist have been stripped out and replaced by symbolic links into host_injections
;CUDA-Samples/12.1-GCC-12.3.0-CUDA-12.1.1
installed for all CPU targets;CUDA
in both host_injections
and in software layer + installation of CUDA-Samples
on top works;post_sanitycheck_cuda
hook, mainly to get better logging on what gets included in the CUDA installation, and what is stripped out because it's not whitelisted;CUDA-Samples
for all CPU targets, deploy those installations in software.eessi.io/versions/2023.06
, and merge the PRgpu_support
scripts in EESSI repository (.../versions/2023.06/scripts/gpu_support
?) + make necessary changes in follow-up PR (already done in #434);aarch64
and x86_64
that include GPU drivers;neoverse_v1
);
There have been a number of issues and PRs to date related to this, but we now need to get this in order and bring all those efforts up to date. There's the updated task list for supporting NVIDIAs GPUs:
host_injections
subdirectory with the build bot (and for end users). WIP with https://github.com/EESSI/software-layer/pull/368host_injections
(WIP with #381)CUDAsamples
to verify CUDA compilation with this approach (WIP with #381)/.singularity.d/libs
so our linker also works within containers). This requires updates to theld.config
that we ship for our linker. The relevant libraries are listed within https://github.com/apptainer/apptainer/blob/main/etc/nvliblist.confp7zip
to support unpacking RPMs (optional now that we have permission to ship the CUDA compatibility libraries under the CUDA EULA)