GoogleCloudPlatform / ops-agent

Apache License 2.0
139 stars 68 forks source link

Ops Agent Frequent Errors Written to Event Log. #1782

Open vermiciouskid opened 3 weeks ago

vermiciouskid commented 3 weeks ago

Currently facing a lot of white noise errors with system processes. Was hoping the recent addition of mute_process_exe_error: in pull 1748 would resolve this for us but no joy. Perhaps adding in the rest of the mute process errors would clear this up?

https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/receiver/hostmetricsreceiver/README.md#process

mute_process_name_error: <true|false> mute_process_exe_error: <true|false> mute_process_io_error: <true|false> mute_process_user_error: <true|false> mute_process_cgroup_error: <true|false>

This happens every two mintues which we've set the collection interval to 1.7243456346814048e+09 error scraperhelper/scrapercontroller.go:197 Error scraping metrics {"kind": "receiver", "name": "hostmetrics/hostmetrics", "data_type": "metrics", "error": "error reading command for process \"lsass.exe\" (pid 580): could not get CommandLine: cannot read process PEB", "scraper": "process"} go.opentelemetry.io/collector/receiver/scraperhelper.(controller).scrapeMetricsAndReport C:/Users/ContainerAdministrator/go/pkg/mod/go.opentelemetry.io/collector/receiver@v0.102.0/scraperhelper/scrapercontroller.go:197 go.opentelemetry.io/collector/receiver/scraperhelper.(controller).startScraping.func1 C:/Users/ContainerAdministrator/go/pkg/mod/go.opentelemetry.io/collector/receiver@v0.102.0/scraperhelper/scrapercontroller.go:173

braydonk commented 3 weeks ago

We have code in our build of the collector that is supposed to turn off these error logs entirely. It must have stopped working on Windows. I'll investigate.

Just to confirm, you are on the latest release of Ops Agent?

vermiciouskid commented 3 weeks ago

Yes I try to keep our fleet at the latest so we are at 2.50.0 at the moment.