UOB-AI / UOB-AI.github.io

A repository to host our documentations website.
https://UOB-AI.github.io
1 stars 3 forks source link

Installing gpustat on the culster #6

Closed heshaaam closed 1 year ago

heshaaam commented 1 year ago

Salam.

Is it possible to install gpustat https://pypi.org/project/gpustat/ so that users can see their utilization of the GPU and its memory by their running job? This command should be run on the compute node itself, after they ssh there.

Or may be there's an easier way to do this?

asubah commented 1 year ago

We can look into it. For now, they can use nvidia-smi in the terminal or ! nvidia-smi from a Jupyter notebook cell. It shows the total GPU RAM utilisation and the running processes as the following example:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:D8:00.0 Off |                    0 |
| N/A   31C    P8    10W /  70W |      2MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
asubah commented 1 year ago

Also you can use nvidia-smi --query-gpu=utilization.gpu,utilization.memory --format=csv

For shorter output

Or nvidia-smi dmon

Monitors default metrics for up to 4 supported devices under natural enumeration (starting with GPU index 0) at a frequency of 1 sec. Runs until terminated with ^C.

Read more about it in the "Device Monitoring" section: https://www.systutorials.com/docs/linux/man/1-nvidia-smi/

asubah commented 1 year ago

We created this dashboard which can be used with the above to view utilisation of the cluster resources: https://hayrat.uob.edu.bh/stats/general