cert-manager / cert-manager

Automatically provision and manage TLS certificates in Kubernetes
https://cert-manager.io
Apache License 2.0
12.13k stars 2.09k forks source link

Let's encrypt certificate caching to mitigate rate limits problems #3298

Open fmasion opened 4 years ago

fmasion commented 4 years ago

Problem to solve. When you reinstall multiple times an app (with helm for exemple) each time you ask for a new certificate. Let's encrypt rate limits can be reached fairly quickly. The problem is we consume limited let's encrypt calls but the certificates are discarded for a new one for no good reason : they are still valid.

the request The idea is to have some 'cert cache'. So when I destroy my ingress the secret is destroyed but a copy is still available in the cache. When I create a new ingress the injector first look if a valid cert already exist before asking a new one to let's encrypt

With this cache some rate limiting problems with let's encrypt could be mitigated.

This approach may be usefull for other issuer too

/kind feature

meyskens commented 4 years ago

Adding a cert cache would also mean a private key cache, which then becomes a security risk.

We could add an option to not delete the certificate and secret when an ingress is deleted, that might solve it?

fmasion commented 4 years ago

Thank you for answering so quickly

Well I understand the risk part of it. Not destroying the secret is (to my point of view) a kind of cache because the general flow will still be the same : 'does it already exist ? if not create it'

I have no preference on how the 'cache' is implemented (cert-manager vault, k8s secret, something else ? )

What's important is the result because at the end : the real risk is to reach the rate limit for bad reasons an having the service down...

meyskens commented 4 years ago

I looked into my idea and it might be a bad idea as we now ignore certs that have no owner in the ingress-shim, changing that will break people's setup. But i might have found an even better method, Helm has

metadata:
  annotations:
    "helm.sh/resource-policy": keep

You can put that on the ingress, however the nicest setup would be:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  annotations:
    "helm.sh/resource-policy": keep
  name: example-com
spec:
  secretName: example-com-tls
  dnsNames:
  - example.com
  - www.example.com
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    # no cert-manager annotations needed
  name: myIngress
spec:
  rules:
  - host: example.com
    http:
      paths:
      - backend:
          serviceName: myservice
          servicePort: 80
        path: /
  tls:
  - hosts:
    - example.com
    - www.example.com
    secretName: example-com-tls
fmasion commented 4 years ago

Hello Maartje,

This sounds pretty clever. I have to add a cleaning step if the host is no more needed but it makes the certificate lifecycle more clear : create / update / delete instead of (re)create / delete

I try that and give you feedback Thanx Fred

meyskens commented 4 years ago

@fmasion a PR to the documentation would be welcome about this! Can't imagine you're the only person in the world hitting this so it will help others :wink:

paultiplady commented 3 years ago

I'd like to add a different use-case example for this feature request.

I use review apps. I create a fresh namespace & copy of my app per Pull Request. I wire up certs and DNS on the review app so that it's a full production-like environment.

My ideal setup would be to have a fresh letsencrypt certificate per review-app, but that's not possible due to the weekly limit on cert issuance. So a fallback option would be to dynamically request a *.test.mydomain.com cert, and use that for each review app.

If I build the naive implementation and create a new Certificate for *.test.mydomain.com in each review-app namespace, then cert-manager will request a fresh cert each time, which will quickly hit the (5-per-week?) limit on cert re-issuance.

What I'd love to have is a configurable cache where cert-manager can detect that it's generated *.test.mydomain.com before, and just create the appropriate Secret that it generated the first time without re-issuing.

For my usecase it would be perfectly acceptable to have cert-manager create a Secret in the cert-manager namespace as a way of storing the cached cert keys, since I use RBAC to control access to Secrets between namespaces. However I'm sure there are other approaches that would also work well.

For now my workaround is to buy a long-lived wildcard cert and inject that as a build secret, so I'm essentially unable to use letsencrypt/cert-manager to provision certificates for this usecase.

Note the key difference in my case is that cert-manager is fulfilling Certificates in different namespaces, but if it kept cluster-level state it would be able to avoid making requests to the letsencrypt API. This would also solve the original request (even though there are other approaches to solving the issue that don't require cert-manager to do the caching).

abierbaum commented 3 years ago

@fmasion What method did you end up using? I am running into the exact same type of restriction. For me it is our QA/QC infrastructure where we test out new application deployments. As part of this we first clean the test K8S cluster of all application resources including the ingress(es) and then redeploy. That burns through certificate requests very quickly.

jakexks commented 3 years ago

/help It would be nice to have a documentation PR for this set up. /priority backlog

jetstack-bot commented 3 years ago

@jakexks: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/jetstack/cert-manager/issues/3298): >/help >It would be nice to have a documentation PR for this set up. >/priority backlog Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
jetstack-bot commented 3 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

jetstack-bot commented 3 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle rotten /remove-lifecycle stale

FearlessHyena commented 2 years ago

I've been having the same issue. Has anyone found any workaround for this?

irbekrm commented 2 years ago

Secrets are not deleted when the Certificate/Ingress are deleted (This behaviour may have changed since this issue was created)

So in the case where 1) a user creates an Ingress with cert-manager ingress-shim annotations 2) a certificate is issued and stored in a Secret 3) user deletes the Ingress <- results in Certificate custom resource being deleted, but not the Secret 4) user re-creates the Ingress

a new Certificate custom resource will be created, but it wil pick up the existing Secret so the certificate will not be re-issued (given that there hasn't been a change in the spec such as different DNS names etc) and this should not result in hitting the rate limits.

This is the default behaviour unless --enable-certificate-owner-ref flag has been set to true on cert-manager controller.

Perhaps the original issue was caused by the Secret being deleted due to i.e the namespace also being deleted in step 3?

So a fallback option would be to dynamically request a *.test.mydomain.com cert, and use that for each review app.

For this, perhaps a solution would be to use some tool that can sync Secrets to the newly created namespace such as kubed

jetstack-bot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

ak2766 commented 2 years ago

I wonder if Hierarchical Namespaces would help here where the secrets are stored in the parent namespace then synced down to children namespaces!

jetstack-bot commented 2 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle rotten /remove-lifecycle stale

FearlessHyena commented 2 years ago

/remove-lifecycle rotten

BlackthornYugen commented 2 years ago

I'm also interested in a way to share/cache certificates & keys between namespaces.

UC 1 - Recreated namespace

In one use case I am frequently deleting / recreating a namespace to do a full suite of tests, but I don't want to request a new certificate. Imagine: feature1.app1.example.org in the feature1 namespace. I have a pipeline that creates the namespace, runs system tests, and deletes the namespace. It's convenient to be able to delete the whole namespace to ensure that the next tests are not bringing over any kind of state / configuration from the last run, but it would be nice to keep the same certificate.

UC 2 - Shared domain with many namespaces

In my other use case I have a root dns name that is shared between many different apps. Imagine app.example.org/auth goes to my keycloak namespace and app.example.org/app1/ goes to my app1 namespace. I have a dozen different apps that it is handy to keep in different namespaces but I'd like to be able to share the same certificate.

WoodyWoodsta commented 2 years ago

@BlackthornYugen The two use-cases you provide touch on a more underlying problem which is in Secrets and how they are not shareable across namespaces. If we were able to change the "output target" of certs to something like Hashicorp Vault, then this would not only solve the issue of both of your use cases, but also make the entire system much more secure (since secrets in k8s aren't exactly as secure as is warranted with TLS certificates).

Similar question here: https://github.com/cert-manager/cert-manager/issues/910

paultiplady commented 2 years ago

@WoodyWoodsta given that you can enable Encryption at Rest to set up KEK/DED layered encryption for Secrets (much as Vault does), what's the problem with Secrets?

My understanding is that that KMS-encrypted secrets in GKE (for example) are about as safe as putting them in Vault. Is that not so?

https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/ https://cloud.google.com/kubernetes-engine/docs/how-to/encrypting-secrets

The obvious problem with moving away from Secrets for cert-manager is you need a Secret to be able to plug into other k8s objects like Ingress; moving to another storage mechanism would break a bunch of existing cert-manager workflows. So while it's fine to offer non-Secret solutions as an alternative approach, I don't think it's acceptable to replace Secrets with something else. And so this issue would still need a Secret-specific solution.

As I noted above it's entirely possible for cert-manager to choose to cache certs in a Secret-centric way:

For my usecase it would be perfectly acceptable to have cert-manager create a Secret in the cert-manager namespace as a way of storing the cached cert keys, since I use RBAC to control access to Secrets between namespaces. However I'm sure there are other approaches that would also work well.

Though of course, caching would be more work for cert-manager. As also noted above hierarchical namespaces could be another solution, if that ever gets upstreamed into k8s.

WoodyWoodsta commented 2 years ago

@paultiplady You make a good point, and as it stands, our solution is to employ encryption at rest for secrets via a Vault KMS, generate all certificates in a security controlled namespace and manually copy them into Vault for distribution around the cluster (with a view to automate this push to vault in the future).

moving to another storage mechanism would break a bunch of existing cert-manager workflows

Moving to another storage mechanism doesn't mean that Secrets are completely out of the picture for delivery to other parts of the cluster.

Out of the above, the biggest requirement is for an instantiate-once, use-many-times pattern, which is the opposite of a typical resource pattern in k8s: create as needed. Of course this requirement ultimately comes from the rate limits from LE.

WoodyWoodsta commented 2 years ago

To add: hierarchical namespaces would only be pushing potentially the same issue up to parent namespaces. If you were to then decide to create certificates at the root namespace (if that is how you architected your cluster), then this is no different from a theoretical "ClusterSecret".

paultiplady commented 2 years ago

If you were to then decide to create certificates at the root namespace

I think the Hierarchical Namespace approach in general would be something like:

team1/Certificate
team1/test-env-1/Ingress
team1/test-env-2/Ingress
team2/Certificate
...

So team1 can have things like RBAC RoleBinding and Certificate shared into all their sub-namespaces, with permission to access team1/* but separated from other teams' namespaces?

WoodyWoodsta commented 2 years ago

That's one way to structure, but fails as soon as team 2 needs team 1's certificate. This is likely if your company operates under a common domain. I think this still kicks the can down the road.

Anyway, after some research, I found https://github.com/emberstack/kubernetes-reflector which will copy secrets into configured namespaces. So our current approach (which we're mostly happy with) is to:

A proper caching mechanism would still be preferable since this pattern above goes against a purist k8s create-when-needed approach.

jetstack-bot commented 2 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

jetstack-bot commented 1 year ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle rotten /remove-lifecycle stale

sathyanarays commented 1 year ago

/remove-lifecycle rotten

jetstack-bot commented 1 year ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

jetstack-bot commented 1 year ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle rotten /remove-lifecycle stale

jetstack-bot commented 1 year ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten. Send feedback to jetstack. /close

jetstack-bot commented 1 year ago

@jetstack-bot: Closing this issue.

In response to [this](https://github.com/cert-manager/cert-manager/issues/3298#issuecomment-1529639096): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. >Send feedback to [jetstack](https://github.com/jetstack). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
robpearce-flux commented 1 year ago

/remove-lifecycle rotten

dwt commented 8 months ago

Is there a way to reopen this issue if it s not yet solved? We are running into the same problem here and would like a solution too.

Our use case too is that we have too many namespaces that want the same certificate and especially if deleting and recreating the namespace completely to test deployments that hits API rate limits very quickly.

Also, should we ever need to re-setup the whole cluster, that would be a... excuse the pun .. clusterfuck.

wallrj commented 8 months ago

/reopen

jetstack-bot commented 8 months ago

@wallrj: Reopened this issue.

In response to [this](https://github.com/cert-manager/cert-manager/issues/3298#issuecomment-1963661929): >/reopen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
ak2766 commented 7 months ago

In the end, I implemented a solution external to the cluster that copies the cert/key/chain from the secret created by Cert Manager to the filesystem. I then created a deployment wrapper script that first checks if there's a valid certificate in the filesystem and uses that otherwise it deploys the certificate manifest. This has helped me avoid the rate limits.

carsonoid commented 6 months ago

For those that are hitting this I would recommend https://github.com/weave-lab/cached-certificate-operator as referenced in this other issue: https://github.com/cert-manager/cert-manager/issues/1500#issuecomment-1016929950

It is a small abstraction in front of Certificate resources that makes it really easy to share them across namespaces or duplicate them into multiple secrets.

cert-manager-bot commented 3 months ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. /lifecycle stale

dwt commented 3 months ago

/remove-lifecycle stale

cert-manager-bot commented 3 weeks ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. /lifecycle stale

dwt commented 1 week ago

/remove-lifecycle stale