cert-manager / cert-manager

Automatically provision and manage TLS certificates in Kubernetes
https://cert-manager.io
Apache License 2.0
12.01k stars 2.07k forks source link

docs: Add info about client side certificate rotation best practices. #1168

Open bwplotka opened 5 years ago

bwplotka commented 5 years ago

Hi and Happy New Year All!

Thanks for great product. We use it on production for long time, but we want to focus to improve automation and avoid manual intervention during certificate renewal for our services. How to ensure Pod's server will actually reload certificate? Particuraly:

It's definitely not cert-manager issue, but it would be nice for cert-manager to incldue potential solutions to this problem as best practices.

There are multiple options like: A) Ensure application can reload it "hitless"/non-distruptive. E.g you can implement that for Golang HTTP server, or hope that your service you use allows that (mostly they don't). For example envoy recently added that option: https://github.com/envoyproxy/envoy/issues/1194 B) Some generic cert-rotate operator that will rolling restart stateless deployments to load new certificates? Maybe logic like this in cert-manager makes sense? C) Have your rollout tools handle that? (ensure pods are restarted frequently)

What is common way of solving this problem? I guess A for less distruptive rotation possible, but what if it's 3rd party tool that does not support hot reload? I have searched gh issues, but haven't found relevant response.

Do you agree that some docs for best practices for this would be suitable in cert-manager documention?

Environment details (if applicable):

/kind feature

munnerz commented 5 years ago

Happy new year! :tada:

I think this sort of thing would be great to add to our documentation - or at least notes summarising what you've put above, so that users can understand what they need to do and what their options are 😄

/kind documentation

paultiplady commented 5 years ago

This is one of those subtle issues that isn't apparent from reading the intro docs, and will cause a full outage when it bites you. I think it's worth at least calling out as a "here be dragons" kind of message; whatever your chosen solution, if you haven't picked one, then you are probably going to have an outage at some point (usually coinciding with when your team is all on vacation, since that's when the code/deploy velocity will have dropped off).

(Not being overly-specific because this is exactly what happened to me or anything like that...) :)

bwplotka commented 5 years ago

I don't get @paultiplady what is the actual outcome of your comment (: Are you just ranting about fact that nothing works for 100%? Sure but can we focus on fixing this issue, to recommend or explain solution that will be closer to 100% than others?

paultiplady commented 5 years ago

I'm adding a user use-case emphasizing that this is important to document, as it produces outages if it's not handled.

rmb938 commented 5 years ago

So I just found this https://github.com/pusher/wave. It will watch for changes on configmaps and secrets for deployments and perform a rolling deploy when they get updated. So to go off of the example from the initial issue the following would happen:

  1. Create a Certificate
  2. Create a deployment with the wave annotation and use the certificate's secret in the deployment
  3. Cert-manager renews and updates Kubernetes secret
  4. Wave sees that the secret was update and performs a rolling deployment.
bwplotka commented 5 years ago

Nice, if that is production rdy then it looks really promising!

retest-bot commented 5 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle stale

retest-bot commented 5 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity. If this issue is safe to close now please do so with /close. Send feedback to jetstack. /lifecycle rotten /remove-lifecycle stale

retest-bot commented 5 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten. Send feedback to jetstack. /close

jetstack-bot commented 5 years ago

@retest-bot: Closing this issue.

In response to [this](https://github.com/jetstack/cert-manager/issues/1168#issuecomment-517938260): >Rotten issues close after 30d of inactivity. >Reopen the issue with `/reopen`. >Mark the issue as fresh with `/remove-lifecycle rotten`. >Send feedback to [jetstack](https://github.com/jetstack). >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
PHameete commented 4 years ago

Currently have to implement a solution for this as well and saw the recommendation for Wave above. I also ran into https://github.com/stakater/Reloader which does the same things but has more stars and looks easier to install.

munnerz commented 4 years ago

/reopen /remove-lifecycle rotten /lifecycle frozen

jetstack-bot commented 4 years ago

@munnerz: Reopened this issue.

In response to [this](https://github.com/jetstack/cert-manager/issues/1168#issuecomment-601658871): >/reopen >/remove-lifecycle rotten >/lifecycle frozen Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.