Open TheRiffRafi opened 2 months ago
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)
@VihasMakwana I think I saw you had root caused the source of the OpenProcess failed for pid=1724: The parameter is incorrect
error elsewhere? Or am I misremembering?
@cmacknz yes, that's correct.
On my personal desktop, the metricbeat wasn't able to access following processes, running as root:
This was for system.process
integration though. The above issue is about windows.service
integration but I believe the root cause is similar.
@TheRiffRafi do you see any warning related to SeDebugPrivilege at the beginning of logs?
Something like:
Metricbeat is running without SeDebugPrivilege, a Windows privilege that allows it to collect metrics...
,
Failure while attempting to enable SeDebugPrivilege
or Metricbeat failed to enable the SeDebugPrivilege
?
Can you attach logs from beginning, if possible?
Hello @VihasMakwana!
Unfortunately I can't help with logs, all the instances I have of the failure have the logs with the problem already started, there is no instance of this where we've caught it in a state where the issue is not occurring and then suddenly starts happening (the systems are going weeks without reporting the service).
Also, I have to make a correction on the original description, we have only seen this on 8.10.4, we haven't tested on a more recent version as the entire stack for the user is still on 8.10.4, it was a misunderstanding that we had seen this problem on a later version.
Version: 8.10.4
Operating System:
Steps to Reproduce: No clear steps to reproduce, more info on this later.
Multiple instances of elastic-agent installations are failing to send the windows.service metric set for the windows integration. The system integration continues to send data without issues. The problem happens at random and it is resolved by restarting the elastic agent. ~The issue happens in different versions of 8.x for elastic-agent and it hasn't confirmed as occurring on the latest version (as the user who has experienced this has not upgraded to latest version yet).~ The issue so far has only been seen on 8.10.4
The error reported by metricbeat is the following:
So far the error indicates a problem only with one particular windows service, however, all other services being monitored by metricbeat can't continue to be monitored because this particular service getting in an unexpected state causes the entire metricbeat windows service metricset to stop reporting for any service.
Because this happens at random we are unable to setup debug logging to catch the failure and the logger for this function is not providing any more info.
We need to address 2 items with this issue: