envoyproxy / xds-relay

Caching, aggregation, and relaying for xDS compliant clients and origin servers
Apache License 2.0
131 stars 29 forks source link

xdsrelay container restarts #193

Closed jyotimahapatra closed 3 years ago

jyotimahapatra commented 3 years ago

While rolling in staging I noticed that there are too many container restarts

➜  xds-relay git:(str) cs4 -n xdsrelay-staging get pods
NAME                                READY   STATUS    RESTARTS   AGE
xdsrelay-staging-7995b598d5-tjv56   9/9     Running   10         15h
xdsrelay-staging-7995b598d5-vvf96   9/9     Running   2          15h

This can cause xdsrelay downtime if multiple containers restart at the same time.

jyotimahapatra commented 3 years ago

Looks like readiness probes fail

Events:
  Type     Reason     Age                   From     Message
  ----     ------     ----                  ----     -------
  Warning  Unhealthy  36m                   kubelet  Readiness probe failed: Get http://10.43.181.59:6070/ready: read tcp 10.43.55.163:56620->10.43.181.59:6070: read: connection reset by peer
  Warning  Unhealthy  11m (x3 over 15h)     kubelet  Readiness probe failed: Get http://10.43.181.59:6070/ready: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
  Normal   Started    10m (x6 over 15h)     kubelet  Started container xdsrelay-service-gojson
  Warning  Unhealthy  6m15s                 kubelet  Readiness probe failed: Get http://10.43.181.59:6070/ready: read tcp 10.43.55.163:37882->10.43.181.59:6070: read: connection reset by peer
  Warning  BackOff    5m36s (x10 over 34m)  kubelet  Back-off restarting failed container
jyotimahapatra commented 3 years ago

trying to fix using #194 and #195

jyotimahapatra commented 3 years ago

https://github.com/envoyproxy/xds-relay/commit/04d6532390c8cf8302e28cc2559f6e0069b87c78