Open fmasion opened 4 years ago
Adding a cert cache would also mean a private key cache, which then becomes a security risk.
We could add an option to not delete the certificate and secret when an ingress is deleted, that might solve it?
Thank you for answering so quickly
Well I understand the risk part of it. Not destroying the secret is (to my point of view) a kind of cache because the general flow will still be the same : 'does it already exist ? if not create it'
I have no preference on how the 'cache' is implemented (cert-manager vault, k8s secret, something else ? )
What's important is the result because at the end : the real risk is to reach the rate limit for bad reasons an having the service down...
I looked into my idea and it might be a bad idea as we now ignore certs that have no owner in the ingress-shim, changing that will break people's setup. But i might have found an even better method, Helm has
metadata:
annotations:
"helm.sh/resource-policy": keep
You can put that on the ingress, however the nicest setup would be:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
annotations:
"helm.sh/resource-policy": keep
name: example-com
spec:
secretName: example-com-tls
dnsNames:
- example.com
- www.example.com
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
# no cert-manager annotations needed
name: myIngress
spec:
rules:
- host: example.com
http:
paths:
- backend:
serviceName: myservice
servicePort: 80
path: /
tls:
- hosts:
- example.com
- www.example.com
secretName: example-com-tls
Hello Maartje,
This sounds pretty clever. I have to add a cleaning step if the host is no more needed but it makes the certificate lifecycle more clear : create / update / delete instead of (re)create / delete
I try that and give you feedback Thanx Fred
@fmasion a PR to the documentation would be welcome about this! Can't imagine you're the only person in the world hitting this so it will help others :wink:
I'd like to add a different use-case example for this feature request.
I use review apps. I create a fresh namespace & copy of my app per Pull Request. I wire up certs and DNS on the review app so that it's a full production-like environment.
My ideal setup would be to have a fresh letsencrypt certificate per review-app, but that's not possible due to the weekly limit on cert issuance. So a fallback option would be to dynamically request a *.test.mydomain.com
cert, and use that for each review app.
If I build the naive implementation and create a new Certificate
for *.test.mydomain.com
in each review-app namespace, then cert-manager will request a fresh cert each time, which will quickly hit the (5-per-week?) limit on cert re-issuance.
What I'd love to have is a configurable cache where cert-manager can detect that it's generated *.test.mydomain.com
before, and just create the appropriate Secret that it generated the first time without re-issuing.
For my usecase it would be perfectly acceptable to have cert-manager create a Secret in the cert-manager
namespace as a way of storing the cached cert keys, since I use RBAC to control access to Secrets between namespaces. However I'm sure there are other approaches that would also work well.
For now my workaround is to buy a long-lived wildcard cert and inject that as a build secret, so I'm essentially unable to use letsencrypt/cert-manager to provision certificates for this usecase.
Note the key difference in my case is that cert-manager is fulfilling Certificates in different namespaces, but if it kept cluster-level state it would be able to avoid making requests to the letsencrypt API. This would also solve the original request (even though there are other approaches to solving the issue that don't require cert-manager to do the caching).
@fmasion What method did you end up using? I am running into the exact same type of restriction. For me it is our QA/QC infrastructure where we test out new application deployments. As part of this we first clean the test K8S cluster of all application resources including the ingress(es) and then redeploy. That burns through certificate requests very quickly.
/help It would be nice to have a documentation PR for this set up. /priority backlog
@jakexks: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to jetstack.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale
I've been having the same issue. Has anyone found any workaround for this?
Secret
s are not deleted when the Certificate
/Ingress
are deleted (This behaviour may have changed since this issue was created)
So in the case where
1) a user creates an Ingress
with cert-manager ingress-shim annotations
2) a certificate is issued and stored in a Secret
3) user deletes the Ingress
<- results in Certificate
custom resource being deleted, but not the Secret
4) user re-creates the Ingress
a new Certificate
custom resource will be created, but it wil pick up the existing Secret
so the certificate will not be re-issued (given that there hasn't been a change in the spec such as different DNS names etc) and this should not result in hitting the rate limits.
This is the default behaviour unless --enable-certificate-owner-ref
flag has been set to true on cert-manager controller.
Perhaps the original issue was caused by the Secret
being deleted due to i.e the namespace also being deleted in step 3?
So a fallback option would be to dynamically request a *.test.mydomain.com cert, and use that for each review app.
For this, perhaps a solution would be to use some tool that can sync Secret
s to the newly created namespace such as kubed
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to jetstack.
/lifecycle stale
I wonder if Hierarchical Namespaces would help here where the secrets are stored in the parent namespace then synced down to children namespaces!
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale
/remove-lifecycle rotten
I'm also interested in a way to share/cache certificates & keys between namespaces.
In one use case I am frequently deleting / recreating a namespace to do a full suite of tests, but I don't want to request a new certificate. Imagine: feature1.app1.example.org in the feature1 namespace. I have a pipeline that creates the namespace, runs system tests, and deletes the namespace. It's convenient to be able to delete the whole namespace to ensure that the next tests are not bringing over any kind of state / configuration from the last run, but it would be nice to keep the same certificate.
In my other use case I have a root dns name that is shared between many different apps. Imagine app.example.org/auth goes to my keycloak namespace and app.example.org/app1/ goes to my app1 namespace. I have a dozen different apps that it is handy to keep in different namespaces but I'd like to be able to share the same certificate.
@BlackthornYugen The two use-cases you provide touch on a more underlying problem which is in Secrets and how they are not shareable across namespaces. If we were able to change the "output target" of certs to something like Hashicorp Vault, then this would not only solve the issue of both of your use cases, but also make the entire system much more secure (since secrets in k8s aren't exactly as secure as is warranted with TLS certificates).
Similar question here: https://github.com/cert-manager/cert-manager/issues/910
@WoodyWoodsta given that you can enable Encryption at Rest to set up KEK/DED layered encryption for Secrets (much as Vault does), what's the problem with Secrets?
My understanding is that that KMS-encrypted secrets in GKE (for example) are about as safe as putting them in Vault. Is that not so?
https://kubernetes.io/docs/tasks/administer-cluster/kms-provider/ https://cloud.google.com/kubernetes-engine/docs/how-to/encrypting-secrets
The obvious problem with moving away from Secrets for cert-manager is you need a Secret to be able to plug into other k8s objects like Ingress; moving to another storage mechanism would break a bunch of existing cert-manager
workflows. So while it's fine to offer non-Secret solutions as an alternative approach, I don't think it's acceptable to replace Secrets with something else. And so this issue would still need a Secret-specific solution.
As I noted above it's entirely possible for cert-manager
to choose to cache certs in a Secret-centric way:
For my usecase it would be perfectly acceptable to have cert-manager create a Secret in the
cert-manager
namespace as a way of storing the cached cert keys, since I use RBAC to control access to Secrets between namespaces. However I'm sure there are other approaches that would also work well.
Though of course, caching would be more work for cert-manager
. As also noted above hierarchical namespaces could be another solution, if that ever gets upstreamed into k8s.
@paultiplady You make a good point, and as it stands, our solution is to employ encryption at rest for secrets via a Vault KMS, generate all certificates in a security controlled namespace and manually copy them into Vault for distribution around the cluster (with a view to automate this push to vault in the future).
moving to another storage mechanism would break a bunch of existing cert-manager workflows
Moving to another storage mechanism doesn't mean that Secrets are completely out of the picture for delivery to other parts of the cluster.
Out of the above, the biggest requirement is for an instantiate-once, use-many-times pattern, which is the opposite of a typical resource pattern in k8s: create as needed. Of course this requirement ultimately comes from the rate limits from LE.
To add: hierarchical namespaces would only be pushing potentially the same issue up to parent namespaces. If you were to then decide to create certificates at the root namespace (if that is how you architected your cluster), then this is no different from a theoretical "ClusterSecret".
If you were to then decide to create certificates at the root namespace
I think the Hierarchical Namespace approach in general would be something like:
team1/Certificate
team1/test-env-1/Ingress
team1/test-env-2/Ingress
team2/Certificate
...
So team1 can have things like RBAC RoleBinding and Certificate shared into all their sub-namespaces, with permission to access team1/*
but separated from other teams' namespaces?
That's one way to structure, but fails as soon as team 2 needs team 1's certificate. This is likely if your company operates under a common domain. I think this still kicks the can down the road.
Anyway, after some research, I found https://github.com/emberstack/kubernetes-reflector which will copy secrets into configured namespaces. So our current approach (which we're mostly happy with) is to:
cert-manager
, which will have strict ACA proper caching mechanism would still be preferable since this pattern above goes against a purist k8s create-when-needed approach.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to jetstack.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale
/remove-lifecycle rotten
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to jetstack.
/lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to jetstack.
/lifecycle rotten
/remove-lifecycle stale
Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen
.
Mark the issue as fresh with /remove-lifecycle rotten
.
Send feedback to jetstack.
/close
@jetstack-bot: Closing this issue.
/remove-lifecycle rotten
Is there a way to reopen this issue if it s not yet solved? We are running into the same problem here and would like a solution too.
Our use case too is that we have too many namespaces that want the same certificate and especially if deleting and recreating the namespace completely to test deployments that hits API rate limits very quickly.
Also, should we ever need to re-setup the whole cluster, that would be a... excuse the pun .. clusterfuck.
/reopen
@wallrj: Reopened this issue.
In the end, I implemented a solution external to the cluster that copies the cert/key/chain from the secret created by Cert Manager to the filesystem. I then created a deployment wrapper script that first checks if there's a valid certificate in the filesystem and uses that otherwise it deploys the certificate manifest. This has helped me avoid the rate limits.
For those that are hitting this I would recommend https://github.com/weave-lab/cached-certificate-operator as referenced in this other issue: https://github.com/cert-manager/cert-manager/issues/1500#issuecomment-1016929950
It is a small abstraction in front of Certificate
resources that makes it really easy to share them across namespaces or duplicate them into multiple secrets.
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
/lifecycle stale
/remove-lifecycle stale
Problem to solve. When you reinstall multiple times an app (with helm for exemple) each time you ask for a new certificate. Let's encrypt rate limits can be reached fairly quickly. The problem is we consume limited let's encrypt calls but the certificates are discarded for a new one for no good reason : they are still valid.
the request The idea is to have some 'cert cache'. So when I destroy my ingress the secret is destroyed but a copy is still available in the cache. When I create a new ingress the injector first look if a valid cert already exist before asking a new one to let's encrypt
With this cache some rate limiting problems with let's encrypt could be mitigated.
This approach may be usefull for other issuer too
/kind feature