cert-manager / trust-manager

trust-manager is an operator for distributing trust bundles across a Kubernetes cluster.
https://cert-manager.io/docs/projects/trust-manager/
Apache License 2.0
256 stars 69 forks source link

overriding trusted namespace #60

Open PaulusTM opened 2 years ago

PaulusTM commented 2 years ago

In my setup the Trust controller runs in the same namespace as the cert-manager installation ( ns: cert-manager ).

I have a certificate that is issued by an Issuer running in the monitoring namespace (ECK deployment). This CA I like to replicate with the Trust operator, so I made a Bundle in the monitoring namespace, expecting this namespace to be the trusted namespace.

After reading some of the issues, it seems like only the Cert-Manager namespace seems trusted, which seems strange to me. The whole point for me to use this project is to provide CA bundles to other namespaces to they can connect to secured microservices.

Is there a way to override the trusted namespace with config? I don't like that my teams need to deploy the Trust Operator in their namespaces, that seams strange.

I have made a clusterrole so the SA in the cert-manager namespace has read/list access to the secrets in the namespaces that have Issuers. This can be scoped to specific secrets so you don't leak.

Love to hear your opinions.

SgtCoDFish commented 1 year ago

I have made a clusterrole so the SA in the cert-manager namespace has read/list access to the secrets in the namespaces that have Issuers. This can be scoped to specific secrets so you don't leak.

I'd love to see this if you're able to share!


Addressing your issue there are a few points:

Your use case

I'd love to know more about your use case. I agree that users shouldn't have to run a trust-manager instance in each namespace. Is it a major problem for users to have all potential sources in one namespace?

I often think about trust-manager in a multi-tenanted Kubernetes environment, where data from one tenant shouldn't be shared with any others. I'm guessing that's not your use case here?

Configuring the trusted namespace

The trusted namespace configurable in Helm via app.trust.namespace which defaults to cert-manager. It sets the --trust-namespace flag on the controller binary (see here).

I think I could see a justification for making it an array, so you could load sources from several namespaces, but I probably wouldn't want to encourage that for general use (see below).

The default namespace name of "cert-manager"

I think "cert-manager" actually isn't a great default because the cert-manager namespace might contain a lot of secrets which trust-manager shouldn't have access to read (e.g. a CA ClusterIssuer's private key). Maybe it should change in the future to make it clear that it's separate.

Enabling Safe Rotation

I think having a separate trusted namespace makes sense from an operational perspective because it encourages users to avoid shooting themselves in the foot when it comes to rotating a trusted certificate.

To be able to safely rotate a root certificate R1 and replace it with R2 users will generally need to trust both R1 and R2 simultaneously for a period during the rotation. If a user has a Secret source issued by cert-manager and the certificate is rotated in-place (i.e. the old cert is replaced by a new one in the same Secret), the trust bundle will flip from only trusting R1 to only trusting R2, meaning that any service which hasn't got a new certificate from R2 yet will be distrusted.

It's pretty subtle and would make a good blog post! I'd encourage users to essentially copy + paste the CA certificate into a ConfigMap in the trust namespace and to be wary of automation when it comes to Bundles, unless they have a very clear root certificate rotation plan in place.

Does that make sense?

PaulusTM commented 1 year ago

The secret hack:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  labels:
    app.kubernetes.io/instance: cert-manager-trust
  name: cert-manager-trust-secret-reader
rules:
  - resources:
      - "secret"
    verbs:
      - "get"
      - "list"
    apiGroups:
      - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cert-manager-trust-secret-reader
  namespace: monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cert-manager-trust-secret-reader
subjects:
  - kind: ServiceAccount
    name: cert-manager-trust
    namespace: cert-manager

Your comments make a lot of sense, but automation is key in large Kubernetes deployments. Not having to manually copy/paste the CA certificate in the trust namespace is why I even started to look at trust-manager.

I solved my issues by making ClusterIssuers and an intermediate certificate in the cert-manager namespace. This intermediate certificate is shared over all namespaces.

The apps now trust the CA that issued the ELK certificates and the problem is solved.

My initial idea was that the ELK namespace would need the have the ClusterIssuer to be able to issue certs.

SgtCoDFish commented 1 year ago

Thanks for sharing!

Your comments make a lot of sense, but automation is key in large Kubernetes deployments. Not having to manually copy/paste the CA certificate in the trust namespace is why I even started to look at trust-manager.

Honestly, this drives right at the heart of it! I totally 100% agree that automation is key, and I wouldn't want to get in the way of that. But there's a genuine footgun here which will absolutely cause outages if it's not worked around with a careful reissuance policy. Automation is key, but the wrong automation is both easy to do and catastrophic. Automating the generation of a trust store is really fraught and one of my goals for trust-manager is to ensure that the easy thing is the right thing that won't blow up later.

One day we'll have this stuff blogged + documented :grin:

qrkourier commented 1 year ago

Following with interest, and I'll leave my brief use case summary for multiple trust namespaces at @irbekrm of JetStack's request after I asked for help in the k8s Slack.

I found myself wanting multiple namespaces because of the way I am using trust-manager. It is possible that I am doing something nonsensical, so please feel free to correct my understanding or approach.

I have an application (OpenZiti) that needs a bunch of certificates and decided to provide those with cert-manager and I installed the app in a separate namespace from cert-manager. The app also needs to publish a trust bundle so I used trust-manager to compose the bundle and sync it to the app's main namespace and a few others that house clients of the server app.

Now I want to install a second instance of my app in yet another namespace, so I am forced to choose between running a redundant trust-manager instance, if that is even possible to avoid colliding clusterissuer names, or to put all instances of my app in the same namespace.

mheese commented 3 months ago

Ouch. This issue is unfortunately kind of the nail in the coffin for me for trust-manager. :(

@PaulusTM 's workaround with different issuers/clusterissuers is unfortunately not an option for me. However, maybe installing trust-manager into multiple namespaces might work. Is that even a supported option @SgtCoDFish ?

The key problem here in my case is to use an issuer to create CA certificates that one wants to distribute to multiple namespaces (ideally simply with a namespace label selector to inject the trust bundle as configmap, which is technically supported). However, the CA secret needs to be in a namespace where an application which is creating certificates by using that CA has access to it, so it cannot be in the cert-manager namespace. Copying the secrets around isn't an acceptable solution obviously either. But setting the trust namespace now only to that namespace is pretty limiting as well: now what do I do for all the other cases of bundles that I want to distribute? So, installing a 2nd trust-manager into the application's namespace might be a solution although that feels weird, because ideally this should really be just handled by a single global trust manager.

I think I could see a justification for making it an array, so you could load sources from several namespaces, but I probably wouldn't want to encourage that for general use (see below).

I honestly don't understand why. This is exactly the use-case when there are multiple custom CAs involved in clusters. (And yes, I read your explanation which I don't understand, but would love to understand).

I think "cert-manager" actually isn't a great default because the cert-manager namespace might contain a lot of secrets which trust-manager shouldn't have access to read

I understand where you are coming from, but this argument also does not make a lot of sense. Cert-manager is an application which manages that. It needs the access, and everyone can review the code and see that it is not misusing its access rights. If this is still a concern, that's what RBAC is good for. That's a lot of leg work, but when necessary one has the tools to deal with this.

I often think about trust-manager in a multi-tenanted Kubernetes environment, where data from one tenant shouldn't be shared with any others.

Again, this is what RBAC is good for. Just because one would get some "access denied" messages in cert-manager wouldn't mean that cert-manager is misbehaving. That's simply what access controls are good for. However, because RBAC can be so involving, that is probably also the reason why I have not seen multi-tenant Kubernetes clusters outside of OpenShift (where multi-tenancy actually also makes sense).

I'd encourage users to essentially copy + paste the CA certificate into a ConfigMap in the trust namespace and to be wary of automation when it comes to Bundles, unless they have a very clear root certificate rotation plan in place.

Does that make sense?

Uh, to be frank: no. One is looking for a solution like cert-manager so that it takes over the automation of rotation, etc.pp. If one needs to manually manage rotation, then this really defeats the purpose of the tool to begin with. I understand that safe rotation which takes multiple versions of previous certificates from certificate sources into account isn't available as of today, but this is really what these CRDs should be able to automatically handle (and yes, this can get hairy, particularly when one thinks of revoked certs for example. I didn't say this is easy :) ).

I would really love to hear your thoughts on this.

svengreb commented 3 months ago

This is not directly related to the actual scope of the ticket, but since some of you in this issue rightly defend themselves against manually copying secrets I'd like to just mention two solutions we cam up with:

  1. All our clusters are managed via Flux where the secret can be "packaged" as its own Kustomization and be synced into any namespace by referencing it in a Kustomization per namespace. This still keep the actual secret the single-source-of-truth, but it can be synced to any desired namespace.
  2. We started to use the kubernetes-replicator controller which allows to sync any Secret, ConfigMap and ServiceAccount (optionally Role and RoleBinding) to any namespace, with advanced features like regular expression selection. All of this is done trough annotations, either on the "source" (push) or the target (pull). The creator is a trusted and experienced hosting company in Germany and the project is actively maintained and stable.

We actually switched to almost only use the replicator method while the Flux Kustomization is used for any other object kind that is not supported by the replicator (whose main focus was to only solve the secret-cross-namespace problem in the first place anyway). And yes, we also use it for cert-manager and trust-manager and it works great.

SgtCoDFish commented 3 months ago

That's super interesting, thanks for sharing @svengreb !

@mheese I hear what you're saying and I get it. I think looking back now I'd relax my concerns about setting multiple trust namespaces - I don't see any particular reason we couldn't add that. I can confirm that running trust-manager in multiple namespaces isn't currently an option.

I totally understand the desire for automation - it's one of the biggest reasons that cert-manager is so successful! The issue is that automatically picking up a rotated root is really dangerous. I think you get it, but I'll restate the point anyway:

If we point trust-manager at a cert-manager secret directly, so that we're saying "we trust this certificate", then when that cert is renewed we'll immediately distrust the old certificate (R1) and immediately trust the new certificate (R2). Anything which still has a cert issued by R1 will immediately break.

That's why that's so dangerous - the normal process is would be go from trusting R1, to then trusting R1 and R2, to then dropping R1 and trusting only R2. That's the primary reason I talk about copying from a cert-manager secret. As above, I think you understand this.

If we somehow keep track of R1 and keep trusting it, that would kinda work but it requires us to come up with a clear system for allowing users to explicitly distrust certs and stop them being tracked.

I'd honestly love for trust-manager to be all-singing and all-dancing in this regard but it's an extreme amount of specialised work to get this right and there simply isn't the time for it. I think our current model of encouraging users to copy things about is clunky but I think it encourages safe practices and it works today.

So in summary... I'd like trust-manager to do more, but we don't have anything close to the amount of time needed for that today. I would definitely consider allowing multiple trust namespace though - that seems like a smaller improvement that could help.

There's one thing I'd like to gently push back on:

If one needs to manually manage rotation, then this really defeats the purpose of the tool to begin with.

I respectfully disagree with this because trust-manager isn't aiming to aid with automatic rotation!

trust-manager is a tool to stitch together trusted certificates from various sources in a cluster and provide them in a single source of truth, and I think it's good at that. It allows users to only need to update one trust store via the Bundle, rather than needing to rebuild all their containers to update all their trust stores.

Plus, because it makes trust stores easier to update, it also makes rotation easier as a side effect of that I think. Not automated, sure - but easier!

If trust-manager isn't the right tool for your use case that's totally fine! I genuinely believe it's a useful tool for most clusters but it's quite narrowly focused and I think that's ok! (That's not to say we can't change its focus in the future, of course!)

mheese commented 3 months ago

@SgtCoDFish great answer! And thanks for taking the time to reply. I appreciate it.

If one needs to manually manage rotation, then this really defeats the purpose of the tool to begin with.

I respectfully disagree with this because trust-manager isn't aiming to aid with automatic rotation!

I admit that this point was a bit over the top. It was me ranting for sure :) trust-manager is certainly still useful. I most certainly apologize for that comment.

I've been managing/rotating/revoking/signing/cross-signing certs kind of all my career. And I can't tell you how tired I am of this process. And as these days basically everything has certificates the work here has just multiplied so much. This is btw one of the really great reasons on why cert-manager (and trust-manager) are so great.

That's why that's so dangerous - the normal process is would be go from trusting R1, to then trusting R1 and R2, to then dropping R1 and trusting only R2. That's the primary reason I talk about copying from a cert-manager secret. As above, I think you understand this.

Yes, I think we all agree that this is the crux. And one needs to be really careful with this. This gets even more complicated when CAs / intermediate CAs are cross-signed.

However, I'm a bit saddened to hear that you don't consider this a goal. And let me explain: I have been in situations before where we reviewed a trust store manually with multiple people and we pushed it, and we made a mistake (and we caused some outages as the result). Looking back at this incident, to be honest if we would have had programmatic logic which reviewed and built the trust store for us we could have probably avoided the incident. My point trying to be: automation is great because it avoids making mistakes. So yes, I think we all agree that this is dangerous. However, it looks like we're drawing different conclusions from it :) ... I think because it is dangerous I want this to be automated so that I can avoid making mistakes like this again, while it looks like you are thinking: this is dangerous, folks need to review this manually before they push it.

I also understand that this is no easy endeavour/undertaking to design and implement. And it's pretty obvious that the current CRDs aren't prepared for that. I'd be happy to think this through with you guys if you'd be accepting of the general idea of course.

@svengreb mentioned the kubernetes-replicator which might actually be an acceptable workaround for this (worth mentioning maybe in the docs?). I just don't like that it means that there is now yet another tool to install/maintain. Multiple trust namespaces would most certainly be a nicer way out of it.

Evesy commented 3 months ago

We had a fairly simple use case that trust-manager would have been a nice fit for.

We use Cloudflare authenticated origin pulls (mTLS) to our origin using their own provided certificate (which our origin trusts), but we also want to be able to hit our origin ourselves, so would have a cert-manager issused Certificate from a private internal CA, that the origin would also trust. This would mean we want a certificate bundle comprised of one static hardcoded certificate (Cloudflares), and another CA certificate that cert-manager manages, which would live in the same namespace as our origin ingress controller.

With this in mind, I'd hoped trust-manager could be deployed to some 'central' namespace (risks aside, it living in the cert-manager namespace feels most appropriate), and we could deploy a Bundle resource in the namespace of the origin ingress controller, and it would be allowed to reference any certificates that also reside in that same namespace; rather than a specific trust-manager instance having to be deployed in the same namespace as the ingress controller.

This use-case is maybe slightly different as we're not looking to distribute CA bundles across the cluster, but just as a convenient way of building a CA bundle for a single workload to consume, but in this scenario as trust-manager flag, e.g. --trust-own-namespace or similar, that would allow a Bundle to reference any sources if they're in the same namespace would work wonders

cert-manager-bot commented 3 days ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. /lifecycle stale