UKHomeOffice / vault-sidekick

Vault sidekick
Apache License 2.0
195 stars 62 forks source link

Exit on failure to renew lease #89

Closed cpick closed 4 years ago

cpick commented 5 years ago

During a recent network outage I had a vault-sidekick Pod that was unable to reach vault during a few subsequent lease-renewal attempts, timing out with:

failed to renew the resource: type: gcp, path: gcp/key/service for renewal, error: Put https://vault/v1/sys/leases/renew: dial tcp: i/o timeout

As a result, the lease expired and subsequent renewals failed with:

failed to renew the resource: type: gcp, path: gcp/key/service for renewal, error: Error making API request.

URL: PUT https://vault/v1/sys/leases/renew Code: 400. Errors:

  • lease not found or lease is not renewable rescheduling the resource: type: gcp, path: gcp/key/service, channel: 0xdeadbeef

This failure repeated indefinitely until the situation was noticed and the Pod was restarted manually. Ideally, the vault-sidekick Pod would have realized this was a non-recoverable error and exited. It would have then automatically restarted, generated a new lease, and automatically become healthy again.

vault-sidekick v0.3.9 was run with (roughly) the following configuration: ./vault-sidekick -v=3 -vault=https://vault -renew-token -cn=gcp:gcp/key/service:renew=true

cpick commented 5 years ago

If this behavior sounds reasonable (and it's not available under some existing option that I've overlooked) I'd be happy to submit a patch.