cert-manager / trust-manager

trust-manager is an operator for distributing trust bundles across a Kubernetes cluster.
https://cert-manager.io/docs/projects/trust-manager/
Apache License 2.0
233 stars 64 forks source link

Provide deterministic bundle #310

Closed Jiawei0227 closed 23 hours ago

Jiawei0227 commented 4 months ago

The generated trust bundle from trust-manager should provide deterministic bundle in the sense that if the source of the CA order changes, or there are multiple CA sources from multiple secret and they are shuffled in the spec. The generated final trust bundle should always be the same with the same ordering. Mayby by alphabetic order or expiration time order or something.

Jiawei0227 commented 4 months ago

Cross link to: https://github.com/cert-manager/trust-manager/pull/303#issuecomment-1968587812

This means if a Bundle A looks like:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: trust-store
spec:
  sources:
  - secret:
      key: ca.crt
      name ca1
  - secret:
      key: ca.crt
      name: ca2
  target:
    configMap:
      key: ca.crt

and bundle B looks like this

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: trust-store
spec:
  sources:
  - secret:
      key: ca.crt
      name ca2
  - secret:
      key: ca.crt
      name: ca1
  target:
    configMap:
      key: ca.crt

it should always produce the same content. Also if we move one CA from ca1 -> ca2, the result should be the same.

Jiawei0227 commented 4 months ago

@SgtCoDFish thoughts on potentially get this done? It sounds to me we can just do alphabetic order during producing final bundle which should be good enough.

SgtCoDFish commented 4 months ago

I don't currently have the bandwidth to implement this, but I'd be happy to review a PR which does it! My 2c would be to hash the DER-encoded certs and then order them alphanumerically based on the hex-encoded hash

Jiawei0227 commented 4 months ago

not sure if anyone would be interested to pick it up but this will be a critical feature. Reason is our component is mounting the trust bundle configmap and the other automation is reconciling the bundle. But if the bundle data keep reordering it will be very expensive and unnecessary.

erikgb commented 4 months ago

But the order is consistent now, and that's good/required. Are you planning to shuffle the sources around @Jiawei0227? I don't say this shouldn't be fixed, but I don't consider it critical. 😸

sebEg commented 6 days ago

Hi, we have a Bundle which includes six configMaps using a label selector:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: cluster-default-ca
spec:
  sources:
  - useDefaultCAs: true
  - configMap:
      selector:
        matchLabels: 
          cluster-default-ca: "true"
      key: "root-ca.pem"
  target:
    configMap:
      key: "root-certs.pem"
    additionalFormats:
      jks:
        key: "root-certs.jks"
      pkcs12:
        key: "root-certs.p12"

We also observe that the order of certificates changes with every reconciliation, leading to many unnecessary updates of the generated configmaps. While the output for root-certs.jks seems to be consistent, root-certs.pem and root-certs.p12 change with almost every reconciliaton.

jan-kantert commented 6 days ago

@erikgb This caused three near misses for us. In two cases it caused etcd to run out of space within a short timespan (A). In two cases (one case had both) it caused etcd to use so much memory that out masters went OOM (B).

We use trust-manager to inject three configmaps (full CA certs) into each namespace. This happened in fairly small clusters (<30 namespaces). You can reproduce it by simply restarting trust-manager a few times. It will recreate those configmaps, etcd will grow by 1-2 GB and memory on the master will rise by about 3-6 GB.

Personally, I would consider this a critical issue. Currently, we have to massively overprovision our masters. On Azure this prevents you from using the smallest Kubernetes tier at all as it will kill the API. On AWS we had to roughly double our costs.

erikgb commented 6 days ago

I agree this is a serious issue, but probably relatively easy to fix. Any watchers that would like to try a PR to fix this?

jan-kantert commented 6 days ago

We investigated this a bit more. It turns out that our issue is caused by non-deterministic ordering in label selectors. This is how our bundle looks like:

apiVersion: trust.cert-manager.io/v1alpha1
kind: Bundle
metadata:
  name: xxx-default-ca
  namespace: my-ns
spec:
  sources:
  - useDefaultCAs: true
  - configMap:
      selector:
        matchLabels: 
          certs.infrastructure.mydomain.com/inject-default-ca: "true"
      key: "root-ca.pem"
  target:
    configMap:
      key: "root-certs.pem"
    additionalFormats:
      jks:
        key: "root-certs.jks"
      pkcs12:
        key: "root-certs.p12"

If we replace our selector with a static list this becomes less of an issue. With the selector the content seems to change every time.

I agree this is a serious issue, but probably relatively easy to fix. Any watchers that would like to try a PR to fix this?

I will have a look if I can reproduce this in a test.

arsenalzp commented 3 days ago

What caused that issue?

maelvls commented 3 days ago

I see two different issues here:

I suggest that we don't conflate the feature request with the bug. I propose that the current issue (#310) keeps track of @Jiawei0227's feature request. @jan-kantert @sebEg can you create a separate issue for the bug you found?

jabdoa2 commented 3 days ago

I fixed both cases in in #380. Both issues have the same underlying cause. The second case just triggers the issue far more frequently since kubernetes will not order configmaps when loaded via a lebel selector. We could also sort those configmaps. That would fix the second issue (unless you rename configmaps). However, just ordering the certs will fix both issues at once. I can also add a test for the second case but it will be fixed as well.