XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
https://nvitop.readthedocs.io
Apache License 2.0
4.61k stars 144 forks source link

[Feature Request] Refresh rate < 1 sec #65

Closed BlueskyFR closed 1 year ago

BlueskyFR commented 1 year ago

Required prerequisites

Motivation

I see the current minimum refresh rate is 1 second. Could it be something like 0.1 sec so that we can get a more accurate overview of what is happening on the GPU?

Solution

-

Alternatives

-

Additional context

-

XuehaiPan commented 1 year ago

Duplicate of #32, #63.

Could it be something like 0.1 sec

Hi @BlueskyFR, the latency from the NVML API call is relatively high. I think it's meaningless to support small intervals like 0.1 second. If you want a fine-grained report of resource usage, maybe you should use a profiler instead.

so that we can get a more accurate overview of what is happening on the GPU?

  1. You can select a process and then press the <Enter> key. The metrics on the top row will refresh every 1/4 sec.

Process Metrics Screen
Watch metrics for a specific process (shortcut: Enter / Return).

  1. Use nvitop.ResourceMetricCollector, see Resource Metric Collector for more information.
BlueskyFR commented 1 year ago

Thanks for your reply. Why are calls to NVML so slow?

nvidia-smi supports a resolution up to a 10ms refresh rate for instance

XuehaiPan commented 1 year ago

Why are calls to NVML so slow?

nvidia-smi supports a resolution up to a 10ms refresh rate for instance

@BlueskyFR nvidia-smi cannot achieve this.

  1. It depends on how many GPU devices are on board.
  2. If the persistence mode is disabled, the nvidia-smi command will take more time (up to seconds (e.g., 3s)) to do a single query.

We can "refresh" the "fake" results every 10ms. But the results may be queried seconds ago. They are not accurate.

Here are some benchmark results from my side. You can try hyperfine on your machine to see the latency.

$ hyperfine --warmup 50 --runs 200 nvidia-smi
Benchmark 1: nvidia-smi
  Time (mean ± σ):     113.6 ms ±   8.4 ms    [User: 5.3 ms, System: 3.9 ms]
  Range (min … max):    98.4 ms … 141.4 ms    200 runs
$ hyperfine --warmup 50 --runs 200 nvidia-smi
Benchmark 1: nvidia-smi
  Time (mean ± σ):      1.920 s ±  0.417 s    [User: 0.007 s, System: 1.298 s]
  Range (min … max):    1.314 s …  4.250 s    200 runs

It takes 2 seconds to do a single query. It cannot run under 10ms.

BlueskyFR commented 1 year ago

Why are calls to NVML so slow?

nvidia-smi supports a resolution up to a 10ms refresh rate for instance

@BlueskyFR nvidia-smi cannot achieve this.

  1. It depends on how many GPU devices are on board.
  2. If the persistence mode is disabled, the nvidia-smi command will take more time (up to seconds (e.g., 3s)) to do a single query.

We can "refresh" the "fake" results every 10ms. But the results may be queried seconds ago. They are not accurate.

Here are some benchmark results from my side. You can try hyperfine on your machine to see the latency.

  • Single NVIDIA 3090 GPU on WSL (persistence mode enabled)
$ hyperfine --warmup 50 --runs 200 nvidia-smi
Benchmark 1: nvidia-smi
  Time (mean ± σ):     113.6 ms ±   8.4 ms    [User: 5.3 ms, System: 3.9 ms]
  Range (min … max):    98.4 ms … 141.4 ms    200 runs
  • 8 x NVIDIA A100 GPU on native Ubuntu (persistence mode enabled)
$ hyperfine --warmup 50 --runs 200 nvidia-smi
Benchmark 1: nvidia-smi
  Time (mean ± σ):      1.920 s ±  0.417 s    [User: 0.007 s, System: 1.298 s]
  Range (min … max):    1.314 s …  4.250 s    200 runs

It takes 2 seconds to do a single query. It cannot run under 10ms.

You are maybe using it wrong 😊

You can see my post here for more details -> https://github.com/influxdata/telegraf/issues/8534#issue-761112264

XuehaiPan commented 1 year ago

You are maybe using it wrong 😊

You can see my post here for more details -> https://github.com/influxdata/telegraf/issues/8534#issue-761112264

Thanks for the reference. nvitop already uses sparse queries with nvidia-ml-py instead of a full query using nvidia-smi. But there are still many things that are slow here. Such as gathering process information, especially when the process number is relatively large (up to hundreds). Also, as I mentioned above, if you don't enable the persistence mode, your nvidia-smi query will take a much longer time.

BlueskyFR commented 1 year ago

So I think maybe it is more a design problem? Maybe the same quantity of information cannot be achieved with nvidia-smi but I doubt it

XuehaiPan commented 1 year ago

So I think maybe it is more a design problem? Maybe the same quantity of information cannot be achieved with nvidia-smi but I doubt it

In your example, you are not querying process information, which is the key feature of nvitop. If you want accurate metrics data, I still think you should use a profiler instead. A day-to-day monitor should not run under high sample frequency for 7x24. That will lead to high power consumption. If you want to monitor a process for only several minutes, why not use a profiler? It should be the more appropriate tool for your use case.

BlueskyFR commented 1 year ago

Could be a solution, what profiler do you have in mind for instance?

XuehaiPan commented 1 year ago

Could be a solution, what profiler do you have in mind for instance?

@BlueskyFR That depends on your use case because profilers need an in-process injection to add hooks to record kernel times. This may need users to update their code. If you are using PyTorch, you may try torch.profiler.profile (pytorch/kineto). It can collect fine-grained metrics and also come with a web-based GUI integration. You may also try the NVIDIA Nsight Systems, a profiling tool from NVIDIA.