edgelesssys / constellation

Constellation is the first Confidential Kubernetes. Constellation shields entire Kubernetes clusters from the (cloud) infrastructure using confidential computing.
GNU Affero General Public License v3.0
903 stars 47 forks source link

debugd: reset unit failed status before restarting #3183

Closed daniel-weisse closed 2 weeks ago

daniel-weisse commented 2 weeks ago

Context

On debug images, our bootstrapper and upgrade-agent systemd units continuously fail until the binaries are uploaded using cdbg deploy. Once this is done, the debugd issues a systemctl restart command to those units. Under some circumstances, this restart can fall into the timeout period from our units being rate limited in restarting by systemd. We can use systemctl reset-failed to reset the failed counter and by this way bypass the timeout.

See https://bugzilla.redhat.com/show_bug.cgi?id=1016548 for some more details about the issue. See the man page for systemctl reset-failed.

Proposed change(s)

Related issue

Additional info

Checklist

netlify[bot] commented 2 weeks ago

Deploy Preview for constellation-docs canceled.

Name Link
Latest commit 9385e634a6137a6376ad41e864daa68c617a63f6
Latest deploy log https://app.netlify.com/sites/constellation-docs/deploys/6673d8b5fb247d0008a8baf0