Open deepankersharmaa opened 1 year ago
It seems abnormal for over 800 of these processes to be running, is that correct?
It depends on the hardware, resources given to the exporter, and other factors, though 800 seems high. The processes the dellhw_exporter starts are expected to be closed either when completed or the commands time outs.
Is there any report of the dellhw_exporter Pod being in an abnormal state due to the oom-killer (for example, process proliferation like this time)?
There are no known issues with the dellhw_exporter in regards to not closing processes/OOM-ing if given the right amount of resources.
For example, with some monitoring agent applications, there is a scenario where processes proliferate
The processes are not meant to stick around, but it depends on exporter config, etc., how often the exporter would call the commands to get the (latest) info for the metrics.
Can you provide the logs of the dellhw_exporter
Hi,
Thanks for your quick response and support.
Please find the below attached dellhw-exporter container log at the time of the problem occurred.
Regards, Deepankar
Hi,
Thanks for your quick response and support. Do we have any updated regarding the same.
Regards, Deepankar
The logs show that some omreport
command processes are being terminated/taking too long.
Hi @galexrt Thanks for you revert
M using the basic command for running exporter using below command there is no specific config/flags used and the scrape interval is 60 seconds. podman run --name pf-dell-exporter -d --privileged -p 9137:9137 {{exporter_image}}
Regards, Deepankar
Hi @galexrt
Any Idea about this ?
@deepankersharmaa The logs indicate that omreport is taking a long time to respond. Did you look into the Dell OMSA services on the machine if there's anything in their logs? Is that issue happening on a single server or multiple servers?
@galexrt , We are also facing similar issues. Looks like it happens randomly on multiple servers
As written before without logs from the system's OMSA services with any hints it is hard to diagnose this.
I don't have access to a Dell server at the moment, so I would appreciate any logs or outputs from OMSA for me to dive in.
Hi,
I have observed large number of omreport and omcliproxy processes generated but not exited or terminated.
I have posting all the results here as it would be redundant, but output similar to approximately 850 lines was seen following this. It is likely that these processes were started in the dellhw_exporter Pod. From the name of this Pod, I speculate that it is an application similar to an agent for monitoring Dell hardware. as Dellhw exporter had a omreport cmd wraper to it to get the data from machine.
Regarding the omreport and omcliproxy, i would like to confirm the following things:
if they behave such as extracting all files including information about the OS under /proc, leading to a sharp increase in the load on the system. Are these processes performing any processing that could cause a load on the system when the number of processes increases rapidly?