cyberark / secretless-broker

Secure your apps by making them Secretless
Apache License 2.0
234 stars 42 forks source link

Conjur K8s authenticator does not fail gracefully or retry when token fails to refresh #1035

Closed izgeri closed 4 years ago

izgeri commented 4 years ago

Summary

The DAP authn-k8s client retrieves a cert from the DAP follower, authenticates with the cert to retrieve a time-limited DAP access token, and is expected to re-authenticate every six minutes (see the Secretless source and the authn-k8s client definition to find this setting). When the cert expires, Secretless should detect that it is no longer logged in and re-login to retrieve a new cert.

At present, it appears that if the authenticate request fails, Secretless does not retry to authenticate and does not fail gracefully. Instead it continues (and fails) to retrieve credentials with each new request. See the logs below for more info.

Steps to Reproduce

Steps to reproduce the behavior:

  1. Deploy Secretless with an app in K8s/OC configured to use the DAP K8s authenticator with a DAP follower
  2. Wait for Secretless to successfully authenticate once, and then temporarily make the DAP follower unavailable so that the next authentication request fails 6 minutes later
  3. Observe that Secretless does not continue trying to authenticate and the Secretless container does not fail and automatically get rescheduled by K8s/OC

Expected Results

On authentication failure, Secretless should retry with exponential backoff a limited number of times. If it does not manage to authenticate, the container should fail so that it can be redeployed.

Actual Results (including error logs, if applicable)

In practice the container does not continue trying to reauthenticate, and though there are Conjur provider errors the container does not fail and is not rescheduled:

2019/12/02 15:57:50 Secretless v1.3.0-0da8bbe starting up...
2019/12/02 15:57:50 Initializing health check on :5335...
2019/12/02 15:57:50 Initialization of health check done. You can access the endpoint at `/live` and `/ready`.
2019/12/02 15:57:50 [WARN]  Plugin hashes were not provided - tampering will not be detectable!
2019/12/02 15:57:50 Trying to load configuration file: /etc/secretless/secretless.yml
2019/12/02 15:57:50 WARN: 'protocol' key found on service 'postgres-connector'. 'protocol' is now deprecated and will be removed in a future release.
2019/12/02 15:57:50 WARN: 'protocol' key found on service 'mysql-connector'. 'protocol' is now deprecated and will be removed in a future release.
2019/12/02 15:57:50 Registering reload signal listeners...
2019/12/02 15:57:50 Instantiating provider 'conjur'
2019/12/02 15:57:50 Info: Conjur provider using Kubernetes authenticator-based authentication
2019/12/02 15:57:50 Info: Conjur provider is authenticating as host/conjur/authn-k8s/openshift/xa-secretless/apps/my-namespace/service_account/secretless-xa ...
...
INFO: 2019/12/02 15:58:20 authenticator.go:174: Not logged in. Trying to log in...
INFO: 2019/12/02 15:58:20 authenticator.go:107: Logging in as host/conjur/authn-k8s/openshift/xa-secretless/apps/my-namespace/service_account/secretless-xa.
INFO: 2019/12/02 15:58:20 requests.go:21: Login request to: https://conjur-follower.xa-secretless.svc.cluster.local/api/authn-k8s/openshift%2Fxa-secretless/inject_client_cert
INFO: 2019/12/02 15:58:20 authenticator.go:181: Logged in
INFO: 2019/12/02 15:58:20 authenticator.go:163: Cert expires: 2019-12-05 15:58:11 +0000 UTC
INFO: 2019/12/02 15:58:20 authenticator.go:164: Current date: 2019-12-02 15:58:20.395362726 +0000 UTC
INFO: 2019/12/02 15:58:20 authenticator.go:165: Buffer time:  30s
INFO: 2019/12/02 15:58:20 requests.go:44: Authn request to: https://conjur-follower.xa-secretless.svc.cluster.local/api/authn-k8s/openshift%2Fxa-secretless/xa/host%2Fconjur%2Fauthn-k8s%2Fopenshift%2Fxa-secretless%2Fapps%2Fmy-namespace%2Fservice_account%2Fsecretless-xa/authenticate
INFO: 2019/12/02 15:58:20 authenticator.go:245: Successfully authenticated!
2019/12/02 16:05:42 Info: Conjur provider is authenticating as host/conjur/authn-k8s/openshift/xa-secretless/apps/my-namespace/service_account/secretless-xa ...
INFO: 2019/12/02 16:05:59 authenticator.go:163: Cert expires: 2019-12-05 15:58:11 +0000 UTC
INFO: 2019/12/02 16:06:37 authenticator.go:164: Current date: 2019-12-02 16:05:44.666902504 +0000 UTC
INFO: 2019/12/02 16:06:59 authenticator.go:165: Buffer time:  30s
INFO: 2019/12/02 16:11:29 requests.go:44: Authn request to: https://conjur-follower.xa-secretless.svc.cluster.local/api/authn-k8s/openshift%2Fxa-secretless/xa/host%2Fconjur%2Fauthn-k8s%2Fopenshift%2Fxa-secretless%2Fapps%2Fmy-namespace%2Fservice_account%2Fsecretless-xa/authenticate
2019/12/02 16:11:45 Info: Conjur provider received an error on authenticate: Post https://conjur-follower.xa-secretless.svc.cluster.local/api/authn-k8s/openshift%2Fxa-secretless/xa/host%2Fconjur%2Fauthn-k8s%2Fopenshift%2Fxa-secretless%2Fapps%2Fmy-namespace%2Fservice_account%2Fsecretless-xa/authenticate: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/12/02 16:11:46 ERROR: Resolving credential 'conjur/xa-secretless-db/postgresql/hostname' from provider 'conjur' failed: Get https://conjur-follower.xa-secretless.svc.cluster.local/api/secrets/xa/variable/conjur%2Fxa-secretless-db%2Fpostgresql%2Fhostname: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
...
2019/12/02 16:12:43 ERROR: Resolving credential 'conjur/xa-secretless-db/postgresql/port' from provider 'conjur' failed: Get https://conjur-follower.xa-secretless.svc.cluster.local/api/secrets/xa/variable/conjur%2Fxa-secretless-db%2Fpostgresql%2Fport: dial tcp 172.30.79.165:443: connect: no route to host
...
2019/12/02 16:12:46 [ERROR] postgres-connector: Failed on handle connection: failed on retrieve credentials: ERROR: Resolving credential 'conjur/xa-secretless-db/postgresql/hostname' from provider 'conjur' failed: Get https://conjur-follower.xa-secretless.svc.cluster.local/api/secrets/xa/variable/conjur%!F(MISSING)xa-secretless-db%!F(MISSING)postgresql%!F(MISSING)hostname: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
...
2019/12/02 18:40:34 ERROR: Resolving credential 'conjur/xa-secretless-db/postgresql/port' from provider 'conjur' failed: Unauthorized: Invalid token.
izgeri commented 4 years ago

Here are the full logs from the example above - they were running for a few days, but for space reasons I cut it off after the first set of Invalid token errors

More detailed logs

izgeri commented 4 years ago

Note - I was able to reproduce this by running the following:

Logs attached here

sgnn7 commented 4 years ago

Resulting behavior with the fixes in the linked PR:

Screen Shot 2019-12-11 at 11 41 43 Screen Shot 2019-12-11 at 11 43 18
sgnn7 commented 4 years ago

Log with the fixes in the linked PR: secretless-retry.txt