cyberark / conjur

CyberArk Conjur automatically secures secrets used by privileged users and machine identities
https://conjur.org
Other
760 stars 123 forks source link

Discussion: Options for Authenticator Decoupling #2355

Open jonahx opened 3 years ago

jonahx commented 3 years ago

Options for Authenticator De-coupling

Overview

There are two main approaches we can take to de-coupling the authenticators:

  1. Full decoupling -- Make each authenticator an independent web service.
    • Necessarily, access to Conjur is only via the API.
    • Writes directly to Audit log socket for audited events.
    • Non-audit logs are separate from Conjur's.
    • Requires configuration (possibly automated) in nginx (or whichever webserver you use). Authenticator services define their own routes, which are mounted under a standard path. For example: authn/k8s or authn/oidc.
    • Existing authenticators can be ported to simple sinatra of rack apps. Additional code will be minimal: Just the route definitions, which will then point to the existing code unchanged.
    • Conjur will provide tokens via authn-local.
  2. Partial decoupling -- Make each authenticator a gem, but still execute code within Conjur's process.
    • Authenticator provides routes, but Conjur code writes those routes into existing config/routes.rb file.
    • Authenticator constructors accept objects giving access to rails logger, audit logger, and interfaces to relevant database queries. These latter need to be designed.

Full Decoupling

Pros

Cons

Discussion

These are notes based on a conversation with Matthew Brace. Like me, he strongly prefers this approach if we can make it secure. One requirement for security is to protect the authn-local socket with authentication of its own, so that if the box itself is owned it won't immediately mean that all Conjur secrets are owned. One promising idea, which could also simplify configuration of the services, is to:

  1. Use rack's middleware proxy in Conjur to route requests to the separate-process authenticator services.
  2. Inject a short-lived, one time use authentication token that can then be used by the authenticator to access the protected authn-local socket which provides tokens.
  3. Use SSL between Conjur and the authenticator services.

Partial Decoupling

Pros

Cons

infamousjoeg commented 3 years ago

I'm not going to reply from Ben's perspective, but from the perspective of the customers I encounter in the Southeast region of the US.

They barely understand it as it stands today. All of Ben's "Fully Decoupling - General Feedback" section is completely spot on from my user's perspective.

If this was started from feedback from DevOps SMEs, I'd like it to be discussed with them prior to any further action on this to confirm the experience you're accepting from an R&D perspective is the same as their customers.

jonahx commented 3 years ago

These are comments @rafis3 sent me offline. I am responding to them here for visibility.

I am missing your opinion, which is very important to me. What is your thought and recommendation and why?

I think there are definitely tradeoffs, and agree with some of your concerns, but as I said fully decoupled strikes me as the superior Long-term solution. I will elaborate below...

Originally, we discussed that we need to understand if we want to go the route of making the authenticators pluggable or if the effort, code complexity, the decoupling process is too hard, to make it worth it.

Imo it needs to be done, if for no other reason than that it is a requirement to reduce our CI times to something reasonable. Our current CI is ~1hr. It's an absolute productivity killer.

If we created partial plugins based on an engine, rather than a gem, we could have had the routes in the plugin as well, right?

Yes, but I would strongly prefer to avoid engines. They bring in all the complexity of a complete rails app, and couple our authenticator code to rails, for no reason. The authenticator logic has nothing to do with rails, and taking on that dependency means that: It is yet another tool for developers learn (engines have subtle differences from rails); yet another thing that will need updates and might have CVE; the code will be more bloated than it needs to be. Also, as compared with vanilla middleware, they don't change anything from the POV the problems I raised.

The full decoupling might have performance implications, because there will be more networking between services. Kind of a step back to how Conjur v4 worked. Might not be significant, but still a risk I think The footprint will probably be larger with the full decoupling, if running more processes, more web services, because each brings its own stack which is an overhead

This is genuine concern I agree with. Unless we're very careful, it is also likely to add to complexity. Imo these are serious concerns that we need to discuss and find good solutions for. I think it is probably possible to do so.

Long-term, imo this is probably the right solution. this is a subjective “pro” as opposed to the others which are technical. So I think it’s worth adding why it’s the right solution, someone else could think differently if no explanation is provided.

My reasons are:

rafis3 commented 3 years ago

Imo it needs to be done, if for no other reason than that it is a requirement to reduce our CI times to something reasonable. Our current CI is ~1hr. It's an absolute productivity killer.

There is more than one way to improve the CI. We can trigger tests that run as part of a PR and commit, based on the area that you touched. The authenticators tests don't have to run always, but can give an asynchronous feedback later. Since they also run in parallel, adding more authenticators doesn't necessarily mean more time with the important assumption that our tests are stable. We must have stable tests, so that more tests, doesn't increase the chance of random failures.

So what I would love to focus our conversation on is ROI. What do we invest, and how we are going to get the value. If the ROI is too low, then while the direction is good, it might not be justified to spend a few months of work. Because once we make a decision to go in that direction, we need to have a clear and realistic plan of how we make it happen.

jonahx commented 3 years ago

@rafis3 Fwiw, after a very long discussion with Ben, he's convinced me that the impact of option 1 (at least the version described above) is not feasible for field operations, and so it's off the table for now. We only want to find a solutions that work for everyone.

rafis3 commented 3 years ago

Ok, let's discuss, align and capture what we learned.