Twingate / helm-charts

Official Twingate Helm Charts
MIT License
11 stars 12 forks source link

Configure Readiness and Liveness Probes #42

Open mattparkes opened 4 months ago

mattparkes commented 4 months ago

This Helm Chart doesn't configure Readiness or Liveness Probes, which results in the Pods persisting in a broken state if something undesirable happens.

For extra context, I believe our specific situation was a Kubernetes cluster that scaled down out of hours but still needed Twingate access. DNS was accidentally broken in this cluster out of hours, but when the cluster scaled back up the Twingate connector never recovered (which is a separate issue than this) until the Pods were manually deleted/restarted.

[msg] All nameservers have failed
[msg] Nameserver 172.20.0.10:53 is back up
[msg] Nameserver 172.20.0.10:53 has failed: request timed out.
[msg] All nameservers have failed

This could/would/should have been fixed automatically by a Liveness Probe.

I noticed that there is documentation about Connector Health Checks and so I think it's just a matter of adding a liveness probe that checks if this is returning OK or not.

Interestingly, this doesn't seem to solve my exact issue, as my pod which is showing as Controller Could not connect in the Twingate Console shows returnsOK in the health check:

kubectl exec -ti -n cloud twingate-a-857875bfb7-s2x29 -- /connectorctl health
OK
linear[bot] commented 4 months ago

OSS-12 Configure Readiness and Liveness Probes