We faced a problem with the default module in GPU queues on Cedar. I added a CUDA module in the Cedar config just it's done for others clusters (helios example), but the our lab stack CUDA is conflicting with the module loaded by SmartDispatch. I understand adding the module to queues is convenient because most of the people need that to use the GPU, but what if the user wants something else? In best case, module is loaded uselessly, but in worst case it conflicts with the users' one.
I would suggest that we remove all cuda modules from configuration files. Users already need to setup their environments, loading CUDA should be part of it. We could add a temporary check for CUDA when a GPU is requested to alert users that they requested a GPU but did not load a cuda module. That would only be temporary, to ease the transition for users relying on module load CUDA.
We faced a problem with the default module in GPU queues on Cedar. I added a CUDA module in the Cedar config just it's done for others clusters (helios example), but the our lab stack CUDA is conflicting with the module loaded by SmartDispatch. I understand adding the module to queues is convenient because most of the people need that to use the GPU, but what if the user wants something else? In best case, module is loaded uselessly, but in worst case it conflicts with the users' one.
I would suggest that we remove all cuda modules from configuration files. Users already need to setup their environments, loading CUDA should be part of it. We could add a temporary check for CUDA when a GPU is requested to alert users that they requested a GPU but did not load a cuda module. That would only be temporary, to ease the transition for users relying on
module load CUDA
.