XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
https://nvitop.readthedocs.io
Apache License 2.0
4.61k stars 144 forks source link

simple change to accept float numbers as interval #63

Closed Sms-Rk closed 1 year ago

Sms-Rk commented 1 year ago

hi, we needed to accept interval less than number 1 so i did very very simple change to accept this maybe its not completely right but if it is not proper to merge, i will appriciate to do this for us

XuehaiPan commented 1 year ago

we needed to accept interval less than number 1 so i did very very simple change to accept this

Hi @Sms-Rk, sorry that I'm going to close this. I'm pretty sure that this change will not accomplish your need. There is also an old discussion in #32 for a similar feature.

Here is something nvitop can do:

  1. Select a process and then press the <Enter> key. The metrics-watching screen will update at an interval of 1/4 sec.

Process Metrics Screen
Watch metrics for a specific process (shortcut: Enter / Return).

  1. Use nvitop.ResourceMetricCollector, see Resource Metric Collector for more information.

If you are working on a delay-sensitive application, I strongly suggest you use a proper profiler rather than a monitor.


FYI, the true interval is defined by TTLCaches. They always run at intervals of 1 second.

Personally, I do not prefer float intervals.

  1. The CLI UI is updated at intervals of 1/4 second. Any interval lower than that will have no effect but will consume much more resources.

https://github.com/XuehaiPan/nvitop/blob/05284ec2f81b029aa16dbeb9487b3112319e4546/nvitop/gui/ui.py#L188-L195

  1. The API calls and communication always have delays, for example, Python <-> NVML C API, NVML C API <-> NVIDIA driver. You always get delayed and inaccurate results. Especially when your snapshot interval is small. If the delay matters (if this is why you want this feature is this PR), you should use a profiler.
Sms-Rk commented 1 year ago

we needed to accept interval less than number 1 so i did very very simple change to accept this

Hi @Sms-Rk, sorry that I'm going to close this. I'm pretty sure that this change will not accomplish your need. There is also an old discussion in #32 for a similar feature.

Here is something nvitop can do:

  1. Select a process and then press the <Enter> key. The metrics-watching screen will update at an interval of 1/4 sec.

Process Metrics Screen Watch metrics for a specific process (shortcut: Enter / Return).

  1. Use nvitop.ResourceMetricCollector, see Resource Metric Collector for more information.

If you are working on a delay-sensitive application, I strongly suggest you use a proper profiler rather than a monitor.

FYI, the true interval is defined by TTLCaches. They always run at intervals of 1 second.

Personally, I do not prefer float intervals.

  1. The CLI UI is updated at intervals of 1/4 second. Any interval lower than that will have no effect but will consume much more resources.

https://github.com/XuehaiPan/nvitop/blob/05284ec2f81b029aa16dbeb9487b3112319e4546/nvitop/gui/ui.py#L188-L195

  1. The API calls and communication always have delays, for example, Python <-> NVML C API, NVML C API <-> NVIDIA driver. You always get delayed and inaccurate results. Especially when your snapshot interval is small. If the delay matters (if this is why you want this feature is this PR), you should use a profiler.

hi @XuehaiPan thanks for your attention ok, we need to monitor and profile SM metric faster than 1 sec , maybe 0.1 sec how can we do it?

XuehaiPan commented 1 year ago

we need to monitor and profile SM metric faster than 1 sec , maybe 0.1 sec

See my comment above. For delay sensitive use cases, you should use a proper profiling tool rather than a monitor. Profilers need in-process injection. This is not a monitor can do.

XuehaiPan commented 1 year ago

Implemented in #67.