mig-parted apply returns the following error in some circumstances:
time="2024-09-30T19:49:46Z" level=error msg="\nThe following GPUs could not be reset:\n GPU 00000000:00:06.0: In use by another client\n\n1 device is currently being used by one or more other processes (e.g., Fabric Manager, CUDA application, graphics application such as an X server, or a monitoring application such as another instance of nvidia-smi). Please first kill all processes using this device and all compute applications running in the system.\n"
There are no other services or processes that uses the GPU, but calling sudo modprobe -r nvidia_drm allows mig-parted to run after. Given that DRM stands for Direct Rendering Manager, I am not sure we need this kernel module.
As far as I know DRM and /dev/dri/cardX devices are used by EGL, and hence by VirtualGL if you want to run something on a compute node that renders via the GPU.
mig-parted apply returns the following error in some circumstances:
There are no other services or processes that uses the GPU, but calling
sudo modprobe -r nvidia_drm
allows mig-parted to run after. Given thatDRM
stands for Direct Rendering Manager, I am not sure we need this kernel module.