Open gfrankliu opened 2 years ago
Typically kernel modules are loaded by the systemd-modules-load.service
(including the nvidia module). This is part of the sysinit.target
which the nvidia-mig-manager.service
depends on.
Nvidia gpu drivers (kernel module) are only loaded when used, and unloaded if not used. To solve that issue, persistent daemon is used. Details can be found https://download.nvidia.com/XFree86/Linux-x86_64/396.51/README/nvidia-persistenced.html
We need to make sure that daemon is started first, in a server, so that the gpu kernel module is loaded even though not being used.
The nvidia kernel module is most definitely not loaded and unloaded across each use. Typically it is loaded at system boot and then remains loaded until the system is shutdown.
If it is not loaded at system boot then running one of the nvidia utilities (e.g. nvidia-smi or nvidia-persistenced) will load the kernel module before it runs. It will not unload it once it is done though. It will remain loaded until the system shuts down or a user explicitly unloads it (via rmmod for example).
What the page you linked refers to is about keeping the GPU in persistence mode or not. This has nothing to do with whether the module is loaded or not, but rather whether the (always loaded) module keeps GPU state alive or not across operations.
Without the persistenced service (or the old persistence mode being enabled) the driver will tear down GPU state across each operation, making it very slow to respond. With persistenced this state is kept alive and the driver is mich more responsive.
In any case, it seems that your system is not loading the module during sysinit and instead relying on the persistenced service to do it for you.
I would recommend adding the nvidia module to the set of preloaded modules as is commonly done on other systems (e.g. the Nvidia DGX systems that this nvidia-mig-manager.service was built for and is tested on).
You are right that the modules are indeed autoloaded, but it seems to have a bit delay, and when nvidia-mig-manager.service runs, it sometimes misses it since nvidia-mig-manager.service is a oneshot service. By the time if I login to check , I do see the modules loaded, and if I manually run nvidia-mig-manager.service again, it works fine. Maybe we can have nvidia-mig-manager.service to wait and retry.
Would it cause any potential issues if we were to move nvidia-persistenced.service from Before to "After"?
Unlike nvidia-mig-manager.service (oneshot), persistenced is a daemon so it can catch up even if it starts before mig-manager.
In general, the nvidia-mig-manager.service
needs to come online and run to completion before all nvidia
services other than nvidia-persistenced
, nvidia-fabricmanager
and nv_peer_mem
. On my system this includes:
nvidia-mlnx-config.service
nvsm-api-gateway.service
nvsm-mqtt.service
nvsm-notifier.service
nvsm-plugin-environment.service
nvsm-plugin-gpumonitor.service
nvsm-plugin-memory.service
nvsm-plugin-monitor-environment.service
nvsm-plugin-monitor-network.service
nvsm-plugin-monitor-pcie.service
nvsm-plugin-monitor-storage.service
nvsm-plugin-monitor-system.service
nvsm-plugin-network.service
nvsm-plugin-pcie.service
nvsm-plugin-processor.service
nvsm-plugin-storage.service
nvsm-selwatcher.service
nvsm.service
The reason for this is because these services become clients of the GPU, prohibiting the nvidia-mig-manager
from resetting any GPUs if mig-mode changes are necessary.
Unfortunately, the only dependencies these services have in the systemd dependency graph are on the sysinit.target
(which is why we explicitly say that the nvidia-mig-manager
has to run Before
this target completes so we can be sure that the nvidia-mig´manager
is finished before any of these services start).
Likewise, the nvidia-persistenced
service also has a dependence on sysinit.target
so it can't be moved before the nvidia-mig-manager
service so long as we need the nvidia-mig-manager
to run before the sysinit.target
.
The right way to do this would be to have all of the nvidia
services that can become clients of the GPU to depend on an intermediate target that then, in turn, depends on sysinit.target
. Unfortunately that is not how things are set up at the moment though, so this is the best we can do.
Based on https://github.com/NVIDIA/mig-parted/blob/main/deployments/systemd/nvidia-mig-manager.service#L19 nvidia-mig-manager.service starts before nvidia-persistenced.service. This causes a problem because nvidia-persistenced.service is responsible to load the nvidia kernel modules on a server, so it needs to start first, otherwise nvidia-mig-manager.service won't be able to create the mig without nvidia drivers being loaded.