XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
https://nvitop.readthedocs.io
Apache License 2.0
4.79k stars 149 forks source link

[Question] Unable to view GPU memory usage in Windows (N/A memory usage) #55

Closed Infinitay closed 1 year ago

Infinitay commented 1 year ago

Required prerequisites

Questions

Whether I use nvidia-smi or nvitop, they both list my GPU memory usage as N/A and WDDM:N/A respectively. Seeing as nvitop uses nvidia's SMI under-the-hood, I suppose that's why I'm having the similar issue. Oddly, using Process Explorer I am able to see the memory usage of my applications. My drivers are up to date, but the behavior is odd. Has anyone experienced this or better yet resolved it?

OS: Windows 10 GPU: RTX 3080 (531.18) Both on WSL2 or Windows environments

XuehaiPan commented 1 year ago

Whether I use nvidia-smi or nvitop, they both list my GPU memory usage as N/A and WDDM:N/A respectively.

OS: Windows 10 GPU: RTX 3080 (531.18) Both on WSL2 or Windows environments

Hi @Infinitay, this is intentional behavior and there is nothing we can do on our side. Both nvidia-smi and nvitop query the GPU states from the NVIDIA Management Library (NVML). Since you are using Windows, the GPU is set with the Windows Display Driver Model (WDDM) mode, the NVML cannot report per-process SM utilization and GPU memory usage.

GPU Memory Usage Amount of memory used on the device by the context. Not available on Windows when running in WDDM mode because Windows KMD manages all the memory not NVIDIA driver.

Ref: nvidia-smi documentation

Driver Model

On Windows, the TCC and WDDM driver models are supported. The driver model can be changed with the (-dm) or (-fdm) flags. The TCC driver model is optimized for compute applications. I.E. kernel launch times will be quicker with TCC. The WDDM driver model is designed for graphics applications and is not recommended for compute applications. Linux does not support multiple driver models, and will always have the value of "N/A".

Current The driver model currently in use. Always "N/A" on Linux.

Processes

List of processes having Compute or Graphics Context on the device. Compute processes are reported on all the fully supported products. Reporting for Graphics processes is limited to the supported products starting with Kepler architecture.

Each Entry is of format "<GPU Index> <PID> <Type> <Process Name> <GPU Memory Usage>"

GPU Index Represents NVML Index of the device. PID Represents Process ID corresponding to the active Compute or Graphics context. Type Displayed as "C" for Compute Process, "G" for Graphics Process, and "C+G" for the process having both Compute and Graphics contexts. Process Name Represents process name for the Compute or Graphics process. GPU Memory Usage Amount of memory used on the device by the context. Not available on Windows when running in WDDM mode because Windows KMD manages all the memory not NVIDIA driver.

Note that the TTC mode is not available for Geforce GPUs:

Fri Mar  3 17:11:30 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 528.49       Driver Version: 528.49       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ... WDDM  | 00000000:09:00.0  On |                  N/A |
|  0%   42C    P8    42W / 350W |   3064MiB / 24576MiB |     22%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                           ^
                           +------ WDDM mode
$ nvidia-smi --force-driver-model=1
Unable to set driver model for GPU 00000000:09:00.0: Not Supported
Treating as warning and moving on.
All done.
XuehaiPan commented 1 year ago

Feel free to ask to reopen this if you have more questions.