Closed elezar closed 6 months ago
With this change we always specify limits in terms of UUIDs when passing these to the MPS control daemon. We also check for valid indices.
Using this we see:
spec: containers: - args: - |- set -e rm -f /var/log/nvidia-mps/startup.log nvidia-cuda-mps-control -d echo set_default_active_thread_percentage 50 | nvidia-cuda-mps-control echo set_default_device_pinned_mem_limit GPU-f22fb098-d1b3-3806-2655-ba25f02229c1 10240M | nvidia-cuda-mps-control echo "startup complete" > /var/log/nvidia-mps/startup.log tail -n +1 -f /var/log/nvidia-mps/control.log command: - chroot - /driver-root - sh - -c env: - name: CUDA_VISIBLE_DEVICES value: GPU-f22fb098-d1b3-3806-2655-ba25f02229c1
Assuming the following claim parameters:
--- apiVersion: gpu.resource.nvidia.com/v1alpha1 kind: GpuClaimParameters metadata: namespace: sharing-demo name: gpu-mps-sharing spec: sharing: strategy: MPS mpsConfig: defaultActiveThreadPercentage: 50 defaultPinnedDeviceMemoryLimit: 10Gi
and
spec: containers: - args: - |- set -e rm -f /var/log/nvidia-mps/startup.log nvidia-cuda-mps-control -d echo set_default_active_thread_percentage 50 | nvidia-cuda-mps-control echo set_default_device_pinned_mem_limit GPU-3109fa37-4445-73c7-b695-1b5a4d13f58e 5120M | nvidia-cuda-mps-control echo "startup complete" > /var/log/nvidia-mps/startup.log tail -n +1 -f /var/log/nvidia-mps/control.log command: - chroot - /driver-root - sh - -c env: - name: CUDA_VISIBLE_DEVICES value: GPU-3109fa37-4445-73c7-b695-1b5a4d13f58e
when using:
--- apiVersion: gpu.resource.nvidia.com/v1alpha1 kind: GpuClaimParameters metadata: namespace: sharing-demo name: gpu-mps-sharing spec: sharing: strategy: MPS mpsConfig: defaultActiveThreadPercentage: 50 defaultPinnedDeviceMemoryLimit: 10Gi defaultPerDevicePinnedMemoryLimit: 0: 5Gi
With this change we always specify limits in terms of UUIDs when passing these to the MPS control daemon. We also check for valid indices.
Using this we see:
Assuming the following claim parameters:
and
when using: