jonahx commented 3 years ago

Options for Authenticator De-coupling

Overview

There are two main approaches we can take to de-coupling the authenticators:

Full decoupling -- Make each authenticator an independent web service.
- Necessarily, access to Conjur is only via the API.
- Writes directly to Audit log socket for audited events.
- Non-audit logs are separate from Conjur's.
- Requires configuration (possibly automated) in nginx (or whichever webserver you use). Authenticator services define their own routes, which are mounted under a standard path. For example: authn/k8s or authn/oidc.
- Existing authenticators can be ported to simple sinatra of rack apps. Additional code will be minimal: Just the route definitions, which will then point to the existing code unchanged.
- Conjur will provide tokens via authn-local.
Partial decoupling -- Make each authenticator a gem, but still execute code within Conjur's process.
- Authenticator provides routes, but Conjur code writes those routes into existing config/routes.rb file.
- Authenticator constructors accept objects giving access to rails logger, audit logger, and interfaces to relevant database queries. These latter need to be designed.

Full Decoupling

Pros

Long-term, imo this is probably the right solution.
Restricting Conjur access through API is cleaner and more maintainable.
Enables fully pluggable 3rd party authenticators, as requested by the field and sales, should we decide to allow that (eg, via UNSAFE flags or similar).
Can be written in any language.

Cons

Requires configuration (even if automated) of the webserver itself, since each authenticator will run as its own service. We can't mount them as middleware within Conjur, because they need to make requests to conjur themselves.
While audit logs can write to the same socket that Conjur does, the individual services will need their own logs, or we'll need an aggregation solution. This means more files to check when debugging.
Security is (arguably) harder. We already have the active authn-local which mints tokens, so even right now anyone could write their own fully pluggable authenticator and configure the webserver to serve it. All we'd be doing is using that existing functionality officially ourselves. The security model (which, again, we already implicitly have) is essentially "anyone who is able to configure a new service on the web server has the authority to mint tokens."

Discussion

These are notes based on a conversation with Matthew Brace. Like me, he strongly prefers this approach if we can make it secure. One requirement for security is to protect the authn-local socket with authentication of its own, so that if the box itself is owned it won't immediately mean that all Conjur secrets are owned. One promising idea, which could also simplify configuration of the services, is to:

Use rack's middleware proxy in Conjur to route requests to the separate-process authenticator services.
Inject a short-lived, one time use authentication token that can then be used by the authenticator to access the protected authn-local socket which provides tokens.
Use SSL between Conjur and the authenticator services.

Partial Decoupling

Pros

Still have just a single Conjur service.
No messing with web server configurations.
Security is simpler since new authenticators can only be enabled by an explicit PR into the Conjur repo. No possibility of a non-vetted authenticator ever being used.

Cons

Not truly decoupled. Since we are providing non-API database access via objects, a new authenticator that requires a query we don't already provide will require a new database access object to be written, and thus a separate PR.
Even when that's not the case, updates to authenticators will always require both the PR on the authenticator gem and a separate PR on Conjur to bring in the update. This makes development less convenient.
Authenticators can only be written in ruby.

infamousjoeg commented 3 years ago

I'm not going to reply from Ben's perspective, but from the perspective of the customers I encounter in the Southeast region of the US.

They barely understand it as it stands today. All of Ben's "Fully Decoupling - General Feedback" section is completely spot on from my user's perspective.

If this was started from feedback from DevOps SMEs, I'd like it to be discussed with them prior to any further action on this to confirm the experience you're accepting from an R&D perspective is the same as their customers.

jonahx commented 3 years ago

These are comments @rafis3 sent me offline. I am responding to them here for visibility.

I am missing your opinion, which is very important to me. What is your thought and recommendation and why?

I think there are definitely tradeoffs, and agree with some of your concerns, but as I said fully decoupled strikes me as the superior Long-term solution. I will elaborate below...

Originally, we discussed that we need to understand if we want to go the route of making the authenticators pluggable or if the effort, code complexity, the decoupling process is too hard, to make it worth it.

Imo it needs to be done, if for no other reason than that it is a requirement to reduce our CI times to something reasonable. Our current CI is ~1hr. It's an absolute productivity killer.

If we created partial plugins based on an engine, rather than a gem, we could have had the routes in the plugin as well, right?

Yes, but I would strongly prefer to avoid engines. They bring in all the complexity of a complete rails app, and couple our authenticator code to rails, for no reason. The authenticator logic has nothing to do with rails, and taking on that dependency means that: It is yet another tool for developers learn (engines have subtle differences from rails); yet another thing that will need updates and might have CVE; the code will be more bloated than it needs to be. Also, as compared with vanilla middleware, they don't change anything from the POV the problems I raised.

The full decoupling might have performance implications, because there will be more networking between services. Kind of a step back to how Conjur v4 worked. Might not be significant, but still a risk I think The footprint will probably be larger with the full decoupling, if running more processes, more web services, because each brings its own stack which is an overhead

This is genuine concern I agree with. Unless we're very careful, it is also likely to add to complexity. Imo these are serious concerns that we need to discuss and find good solutions for. I think it is probably possible to do so.

Long-term, imo this is probably the right solution. this is a subjective “pro” as opposed to the others which are technical. So I think it’s worth adding why it’s the right solution, someone else could think differently if no explanation is provided.

My reasons are:

It enables 3rd party development, which the field and sales engineers have been requresting for a long time. I understand the security concerns around this but believe they can be addressed with appropriate UNSAFE flags and warnings.
Allows authenticators to be written in any language.
Allows authenticators to scale independently of Conjur.
It's conceptually simpler (though, as you noted, operationally more complex).

rafis3 commented 3 years ago

Imo it needs to be done, if for no other reason than that it is a requirement to reduce our CI times to something reasonable. Our current CI is ~1hr. It's an absolute productivity killer.

There is more than one way to improve the CI. We can trigger tests that run as part of a PR and commit, based on the area that you touched. The authenticators tests don't have to run always, but can give an asynchronous feedback later. Since they also run in parallel, adding more authenticators doesn't necessarily mean more time with the important assumption that our tests are stable. We must have stable tests, so that more tests, doesn't increase the chance of random failures.

So what I would love to focus our conversation on is ROI. What do we invest, and how we are going to get the value. If the ROI is too low, then while the direction is good, it might not be justified to spend a few months of work. Because once we make a decision to go in that direction, we need to have a clear and realistic plan of how we make it happen.

jonahx commented 3 years ago

@rafis3 Fwiw, after a very long discussion with Ben, he's convinced me that the impact of option 1 (at least the version described above) is not feasible for field operations, and so it's off the table for now. We only want to find a solutions that work for everyone.

rafis3 commented 3 years ago

Ok, let's discuss, align and capture what we learned.

cyberark / conjur

Discussion: Options for Authenticator Decoupling #2355

Options for Authenticator De-coupling

Overview

Full Decoupling

Pros

Cons

Discussion

Partial Decoupling

Pros

Cons