Closed caotangdaiduong closed 1 year ago
Metric values are retained and not refreshed
Hi @caotangdaiduong, do you set up a prometheus
service to retrieve the latest metrics automatically?
And currently I'm using cron to restart the service every minute, this may sound crazy but the metric is completely accurate.
I know by default nvitop default interval is 1s but I have added the interval option with different values like 15s, 30s but the result is still the same.
@caotangdaiduong I can see the metrics are updating on my side. I'm running watch --differences
:
watch --differences 'curl -s http://127.0.0.1:8000/metrics'
This is similar to pushgateway, it only updates the value with the last key name and if there is a new key, there will be new values. I think it's similar to the case with many different values (in my case, every time the PID, index is changed, it creates a new one, and the old PID, index is still there).
The metrics for GPU processes are actively updated on my side.
I can confirm if the GPU process is gone, the gauge keys still exist. Do you mean you want to remove these keys if the corresponding processes are gone?
- You will see that both the old and new PIDs exist when calling curl to the exporter
@caotangdaiduong I can confirm this and opened a PR #107 to resolve this. You can try it via:
python3 -m pip install "git+https://github.com/XuehaiPan/nvitop.git@exporter-remove-gone-process#egg=nvitop-exporter&subdirectory=nvitop-exporter"
Hi @XuehaiPan
Thanks for your efforts, I tested it and it works as expected
Required prerequisites
What version of nvitop are you using?
1.3.1
Operating system and version
Ubuntu 20.04.4 LTS
NVIDIA driver version
510.47.03
NVIDIA-SMI
Python environment
3.8.10 (default, May 26 2023, 14:05:08) [GCC 9.4.0] linux nvidia-ml-py==12.535.133 nvitop==1.3.1 nvitop-exporter==1.3.1
Problem description
nvitop-exporter cache value
Metric values are retained and not refreshed
Steps to Reproduce
The Python snippets (if any):
Command lines:
Traceback
No response
Logs
No response
Expected behavior
No response
Additional context
No response