Open boniek83 opened 3 years ago
hi @boniek83 - which version of dcgm-exporter
are you using?
nvcr.io/nvidia/k8s/dcgm-exporter:2.1.4-2.2.0-ubuntu20.04 This is version in the gpu-operator v1.6.2
I think this may be related to what we're seeing in #194. Our biggest nv-hostengine.log
was something like 8+ GB.
same issue
Based on feedback from NVIDIA I set the following environment variable to silence the extra logging:
__DCGM_DBG_LVL=NONE
Now the only logs I get in /var/log/nv-hostengine.log is 1 or 2 messages every 30 seconds.
Nice but not good enough since it still does log something. We don't know whether amount of data being logged will change between releases. This should be logged to stdout, in dedicated persistent volume or we should just have an option to disable it altogether.
Either its size should be limited by some configurable option, it shouldn't be created at all or pv/pvc should be used. Ephemeral storage ain't free :)