MPS Memory limits confusion

1. Quick Debug Information

OS/Version(e.g. RHEL8.6, Ubuntu22.04): Ubuntu 22.04
Kernel Version: 5.15.0-112-generic
Container Runtime Type/Version(e.g. Containerd, CRI-O, Docker): containerd://1.6.12
K8s Flavor/Version(e.g. K8s, OCP, Rancher, GKE, EKS): K8s

2. Issue or feature description

I've configured MPS on an NVIDIA LS40 with 10 replicas.

As per the mps daemon logs, a default memory limit of ~4GB has been set.

I0612 11:23:33.074061      53 main.go:187] Retrieving MPS daemons.
I0612 11:23:33.153182      53 daemon.go:93] "Staring MPS daemon" resource="nvidia.com/gpu"
I0612 11:23:33.218453      53 daemon.go:131] "Starting log tailer" resource="nvidia.com/gpu"
[2024-06-12 10:28:13.702 Control    69] Starting control daemon using socket /mps/nvidia.com/gpu/pipe/control
[2024-06-12 10:28:13.702 Control    69] To connect CUDA applications to this daemon, set env CUDA_MPS_PIPE_DIRECTORY=/mps/nvidia.com/gpu/pipe
[2024-06-12 10:28:13.725 Control    69] Accepting connection...
[2024-06-12 10:28:13.725 Control    69] NEW UI
[2024-06-12 10:28:13.725 Control    69] Cmd:set_default_device_pinned_mem_limit 0 4606M

However, if I look at this from the point of view of a client:

import torch 
torch.cuda.get_device_properties(torch.device('cuda'))
# _CudaDeviceProperties(name='NVIDIA L40S', major=8, minor=9, total_memory=45589MB, multi_processor_count=14)

Only the set_default_active_thread_percentage of 10 is respected. The multi_processor_count changes from 142 to 14.

Here's some additional info from the application pod:

printenv | grep CUDA
CUDA_MPS_PIPE_DIRECTORY=/mps/nvidia.com/gpu/pipe

echo "get_default_device_pinned_mem_limit 0" | nvidia-cuda-mps-control
4G

Why is nvidia-cuda-mps-control reporting one thing for memory and pytorch saying something else? This doesn't look right to me, but maybe I'm missing something. If I use MIG with an A100, the total_memory returned reflects the MIG instance as opposed to the total VRAM of the card.

# values.yaml
nodeSelector: {
  nvidia.com/gpu: "true"
}

gfd: 
  enabled: true
  nameOverride: gpu-feature-discovery
  namespaceOverride: {{ nvidia_plugin.namespace }}
  nodeSelector: {
    nvidia.com/gpu: "true"
  }

nfd:
  master:
    nodeSelector: {
      nvidia.com/gpu: "true"
    }
    tolerations:
    - key: "nvidia.com/gpu"
      operator: "Exists"
      effect: "NoSchedule"
  worker:
    nodeSelector: {
      nvidia.com/gpu: "true"
    }

config: 
  default: "default"
  map:
    default: |-
    ls400: |-
      version: v1
      sharing:
        mps:
          resources:
          - name: nvidia.com/gpu
            replicas: 10

Additional information that might help better understand your environment and reproduce the bug:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.67                 Driver Version: 550.67         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L40S                    Off |   00000000:BE:00.0 Off |                    0 |
| N/A   32C    P8             35W /  350W |      35MiB /  46068MiB |      0%   E. Process |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     70756      C   nvidia-cuda-mps-server                         28MiB |
+-----------------------------------------------------------------------------------------+

NVIDIA / k8s-device-plugin

MPS Memory limits confusion #764

1. Quick Debug Information

2. Issue or feature description