GoogleCloudPlatform / osconfig

Apache License 2.0
85 stars 102 forks source link

`nil` `tasker` causes panic while stopping agent #661

Open mathe-matician opened 1 month ago

mathe-matician commented 1 month ago

Issue

In the if block below, if tasker is nil, this causes a panic,

https://github.com/GoogleCloudPlatform/osconfig/blob/235ada847231bb3331f35193da05751a29e9af67/main.go#L247-L255

Sep 04 21:02:41 myhost.internal OSConfigAgent[433]: 2024-09-04T21:02:41.6314Z OSConfigAgent Info: Restart required marker file exists, beginning agent shutdown, waiting for tasks to complete.
Sep 04 21:02:41 myhost.internal google_osconfig_agent[433]: panic: close of nil channel
Sep 04 21:02:41 myhost.internal google_osconfig_agent[433]: goroutine 81 [running]:
Sep 04 21:02:41 myhost.internal google_osconfig_agent[433]: github.com/GoogleCloudPlatform/osconfig/tasker.Close()
Sep 04 21:02:41 myhost.internal google_osconfig_agent[433]:         /tmp/debpackage/google-osconfig-agent-20240501.03/tasker/tasker.go:58 +0x45
Sep 04 21:02:41 myhost.internal google_osconfig_agent[433]: main.runInternalPeriodics({0xf67778, 0xc00051a180})
Sep 04 21:02:41 myhost.internal google_osconfig_agent[433]:         /tmp/debpackage/google-osconfig-agent-20240501.03/main.go:249 +0xca
Sep 04 21:02:41 myhost.internal google_osconfig_agent[433]: created by main.runServiceLoop in goroutine 1
Sep 04 21:02:41 myhost.internal google_osconfig_agent[433]:         /tmp/debpackage/google-osconfig-agent-20240501.03/main.go:267 +0x8d
Sep 04 21:02:41 myhost.internal systemd[1]: google-osconfig-agent.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Sep 04 21:02:41 myhost.internal systemd[1]: google-osconfig-agent.service: Failed with result 'exit-code'.
Sep 04 21:02:42 myhost.internal systemd[1]: google-osconfig-agent.service: Scheduled restart job, restart counter is at 1.
Sep 04 21:02:42 myhost.internal systemd[1]: Stopped google-osconfig-agent.service - Google OSConfig Agent.

Proposed Fix

Probably to check if tasker is nil before using it, that would at least allow the code to reach the range deferredFuncs and not cause a panic.

That being said, I'm not completely sure how tasker was nil in the first place, which may be the underlying issue.