Open zbjornson opened 2 years ago
I experience this same issue with COS version 93:
# cat /etc/os-release
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_CRASH_ID=Lakitu
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=435e3f6b0837d398051855e22b245142aceb1ec6
VERSION=93
VERSION_ID=93
BUILD_ID=16623.39.6
# journalctl -u google-guest-agent.service -p 3
-- Journal begins at Fri 2021-10-22 22:19:33 UTC, ends at Fri 2021-10-22 22:58:32 UTC. --
Oct 22 22:19:41 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:41.4680Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart nscd.service: Unit nscd.service not found.
.
Oct 22 22:19:41 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:41.4773Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart unscd.service: Unit unscd.service not found.
.
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.1232Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart cron.service: Unit cron.service not found.
.
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.2261Z GCEGuestAgent Error oslogin.go:109: Error restarting service: Failed to try-restart crond.service: Unit crond.service not found.
.
Oct 22 22:19:42 elasticsearch-instance GCEGuestAgent[352]: 2021-10-22T22:19:42.2421Z GCEGuestAgent Error oslogin.go:116: Error reloading service: Failed to reload-or-restart ssh.service: Unit ssh.service not found.
.
When lots of servers start or restart at once, we get 100s of these errors that end up triggering server alerts. Could someone please let us know if this is the same as #134 and thus being worked on? (Should I open a GCP Support case?)
The background for these log messages: on startup, the guest agent makes configuration changes, then restarts services for the changes to take effect. It logs a warning message when a service isn't found, but it is benign.
We actually already reduced this extraneous logging in #122 so if you use an updated version of the guest agent, these logs should go away. I think some of our partner distributions have not yet received this change, i.e. Ubuntu or COS.
Thanks @hopkiw. Indeed the latest version available from/for Ubuntu is 20210629.00. Do you know if there's a way to accelerate the release of a new version? (Is that done by Canonical or Google?)
Canonical takes updates on a regular cadence, except for critical vulnerabilities, where we will ask them to prioritize an update or patch. I don't know if end users can influence the process, but I imagine you might try filing them a bug.
When google-guest-agent tries to start, it seems to try to start nscd, unscd, cron and crond, but those units are not present on our servers.
Are these benign? If so, can they be downgraded from Errors?
These same error lines appear in #134, but in my case, the service is active/running, not dead.