cyberark / conjur

CyberArk Conjur automatically secures secrets used by privileged users and machine identities
https://conjur.org
Other
760 stars 123 forks source link

Cert injection is not done properly in authn-k8s #1807

Closed orenbm closed 4 years ago

orenbm commented 4 years ago

Summary

Currently, we close the WebSocket connection after we send the injection request to kubectl. This is not correct as this way we do not wait for kubectl to send its response.

This has 2 implications:

  1. If the injection request has succeeded we will return a 200 OK response    to the 'inject_client_cert' request even if the cert has not finished to    be written to the pod. This can lead to a race-condition where the client    will try to read the cert before it is there.
  2. If the injection request has failed in the k8s API, we will not    write the error to the Conjur logs, and we will falsely return a 200    OK response to the 'inject_client_cert' request

We need to verify that the cert is written to the client's container before returning a 200 OK response.

Steps to Reproduce

Steps to reproduce the behavior:

  1. Simulate an error on the client that will fail the injection (e.g do not give the user permissions on /etc/conjur/ssl)
  2. Run the authn-client
  3. See that you still got a 200 OK response from inject_client_cert although the cert was not copied.

Expected Results

Actual Results (including error logs, if applicable)

Reproducible

eladkug commented 4 years ago

This bug should be fixed as part of this bug

izgeri commented 4 years ago

@orenbm won't waiting to return 200 until the cert injection is complete cause the API request to hang?

Our current design is basically (as I understand it):

I think if we're seriously considering a better pattern than our current one we should consider something like what the Open Service Broker spec uses:

If we are going to revise the flow here, I think @cyberark/aam-architects should weigh in on the impact of having the API hang mid-request.

orenbm commented 4 years ago

thanks @izgeri . Actually the suggested approach here is the current one, at least as seen in the code. Unless i misunderstand the code, we verify that the cert is injected before returning 200 OK. the issue is that we are not doing it properly. you can see the flow here. let me know if you understand it different that me. That's why i have treated this as a bug and not as a behaviour change.

However, i am open to re-thinking this approach and designing it better to not wait for a response but to write the errors to the log nevertheless (which is not happening today).