Open zbk2012 opened 3 months ago
Hi @zbk2012. From your example, it seems as if your config file is not properly indented. You are probably looking for something like instead:
version: v1
sharing:
mps:
resources:
- name: nvidia.com/gpu
replicas: 2
This should also be confirmed by your device plugin logs.
Hi @zbk2012. From your example, it seems as if your config file is not properly indented. You are probably looking for something like instead:
version: v1 sharing: mps: resources: - name: nvidia.com/gpu replicas: 2
This should also be confirmed by your device plugin logs.
Oh, I'm sorry, the indentation was missing when copying. The indentation in the config file is correct.
@zbk2012 could you provide the logs for GFD and the device plugin? For example, I use the following to deploy the plugin:
helm upgrade nvidia -i deployments/helm/nvidia-device-plugin \
--namespace nvidia-device-plugin \
--create-namespace \
--set runtimeClassName=nvidia \
--set config.name=nvidia-plugin-configs \
--set nvidiaDriverRoot=/ \
--set gfd.enabled=true
Where the config is created from:
cat << EOF > dp-mps-config.yaml
version: v1
flags:
migStrategy: "none"
failOnInitError: true
nvidiaDriverRoot: "/"
plugin:
passDeviceSpecs: false
deviceListStrategy:
- envvar
deviceIDStrategy: uuid
sharing:
mps:
renameByDefault: false
resources:
- name: nvidia.com/gpu
replicas: 4
EOF
by running:
kubectl create cm -n nvidia-device-plugin nvidia-plugin-configs \
--from-file=config=dp-mps-config.yaml
#################### logs:
using mps requires --mps-root to be specified.
#################### The contents of thenvidia-device-plugin.yml
file are as follows:#################### The contents of the
/data/system-yaml/a100-mps.yaml
file are as follows:#################### I have added the following content to the
nvidia-device-plugin.yml
file:The container successfully started, but no GPU was found and there is nothing in the
/run/nvidia/mps
directory.How to fill in MPS_ROOT?