Open ocaisa opened 9 months ago
I'm not sure what the best approach here is, for the mentioned script to work, you already need a working CVMFS install. Could there be support for a post install script for CVMFS?
There is already an example of an exec resource that runs after CVMFS is installed. https://github.com/ComputeCanada/puppet-magic_castle/blob/main/site/profile/manifests/cvmfs.pp#L129
We could do the same for EESSI's script. Should the script only run on nodes with a GPU? Or we should alway run it?
Right now, the script errors out if it cannot successfully execute
nvidia-smi --query-gpu=driver_version --format=csv,noheader
so, as it currently is, it should only run on nodes with a GPU. It also currently requires that you initialise EESSI.
Now that I've seen that both of these can raise an issue, I think I'd like to add an option to not throw errors and also to not require that EESSI is initialised (since I know the path to the script being called, I know the version of EESSI, so this is not actually necessary).
For EESSI, we implemented GPU support for the stack and to access the drivers, it basically requires that someone runs the script https://github.com/EESSI/software-layer/blob/2023.06-software.eessi.io/scripts/gpu_support/nvidia/link_nvidia_host_libraries.sh which does the symlinking, but allows for the potential use of CUDA compatibility libraries, and also places (a symlink to) the libraries in a trusted location for the Gentoo Prefix linker.
I'm not sure what the best approach here is, for the mentioned script to work, you already need a working CVMFS install. Could there be support for a post install script for CVMFS?
In the next EESSI release we would plan to add additional trusted locations for the linker (right now there is only one):
which would mean that this script would change a little, so rather than reproduce what it does, I'd like to be able to call it directly.