eclipse-che / che

Kubernetes based Cloud Development Environments for Enterprise Teams
http://eclipse.org/che
Eclipse Public License 2.0
6.99k stars 1.19k forks source link

[devworkspace] Generate webhook server cert with job on kubernetes #17855

Closed sleshchenko closed 4 years ago

sleshchenko commented 4 years ago

Is your enhancement related to a problem? Please describe.

Currently. devworkspace controller with webhooks enabled needs certificate. On the OpenShift we use out-of-the-box to generate such a certificate, for the Kubernetes we have pre-install step to generate those certificates and create them on the cluster. It's needed to rework that pre-install step to be performed on the same level, as for OpenShift infra: webhook server deploying. To generate certificates it's proposed to reuse the same job as Che Operators uses https://github.com/eclipse/che-operator/blob/master/pkg/deploy/tls.go#L277

l0rd commented 4 years ago

Why aren't we using certmanager on Kubernetes? It looks like the industry-trusted tool to manage certificates on Kubernetes.

sleshchenko commented 4 years ago

Better to ask deploy team. I assume we don't want to be dependent on cert-manager since it's not something we can easily configure out-of-the-box in 100% of cases. So, a job is a really light-weight option to go and there is still an ability for user to precreate/configure TLS secrets created in solution whichever user wants, cert manager or any other. We can reuse user's generated TLS cert for devworkspace controller but secrets should have the distinct names. To improve that we may want to make these secrets name configurable.

@tolusha @mmorhun Do you have any comments why we moved from cert-manager on operator/chectl side?

mmorhun commented 4 years ago

At the first iteration of bringing TLS by default into Eclispe Che we used Cert-Manager with helm installer. At that point it sounded like a perfect solution. However, some experience showed, that it has own downsides. What I recall:

So, for operator we decided to use our own lightweight job which just have work done. Operator launches it, if there is no existing certificate, so a user can just precreate one manually or via Cert-Manager or another tool and Che will use it. It is possible to use Cert-Manager, the question is in balance between required effort to implement full integration and then maintain it vs functionality it provides.

sleshchenko commented 4 years ago

P.S. Depending on cert-manager operator means be blocked to support OpenShift 4.6 until they migrate to the new bundle format ) And it even does not seems to be present on operatorhub.io https://operatorhub.io/?keyword=manager The referenced operator in the doc returns 404 https://cert-manager.io/docs/installation/openshift/#installing-with-cert-manager-operator

mmorhun commented 4 years ago

I was always wondering how it is possible to accidentally close issue. It is. Sorry, reopened.

l0rd commented 4 years ago

@sleshchenko I am sorry for not spotting that before but I don't want us to make bad decisions that we may need to pay in the future. We are not in the certificate management business. Maintaining a certificate manager is something I don't want us to deal with. I want to be really sure that there are no better alternatives.

P.S. Depending on cert-manager operator means be blocked to support OpenShift 4.6 until they migrate to the new bundle format ) And it even does not seems to be present on operatorhub.io operatorhub.io/?keyword=manager The referenced operator in the doc returns 404 cert-manager.io/docs/installation/openshift/#installing-with-cert-manager-operator

On OpenShift we should not use certmanager but the cert signing service.

@mmorhun I am not sure if certmanager is the right tool (maybe cfssl is a better choice?) but we definitely should avoid using che-cert-manager-ca-cert-generator-image because that's an hack that works for a PoC.

  • It requires a root CA key/certificate if we need server certificate, so there is still a need in a Kubernetes job to generate it.

I am perfectly ok if we provide a link to the instructions for configuring certmanager. We should not generate the CA key/certificate, the user should do that with its PKI infra.

  • Cert-Manager is deployed via yaml manifest, so it is not easy to customise. For example, it requires cert-manager namespace to be installed into. But sometimes user who installs Che doesn't have the permission to do so and installation just fails. Theoretically we can alter the yamls, but it doesn't sound like a good solution...

Che operator should not install/configure certmanager. It should be a pre-requisite.

  • We need to manually maintain it. From deployment (wait and check that all its components are deployed and ready) to conflicts if it is already installed and maybe has different settings.

Sure but the effort is justified because we will support an industry standard. Whereas maintaining our certificate signer image is something that have less value: customers cannot really use it in production (we do not manage certs rotations, revocation etc...).

  • Cert-Manager has own set of resources (and some pods always running) which consumes resources and has notable impact on Che installation time (mostly due to image pulling, but its startup time also counts).

Again: that should not be part of Che installation time.

sleshchenko commented 4 years ago

@l0rd I'm OK if we'll try to find a better approach. I believe we started self-signed certs support with tls secrets as prerequisites, we had documentation with an article on how to set up self-signed certs. But at the time we were enabling TLS by default - we came to the issue that we want to automate certificates setting up process with chectl server:start, cause it's not convenient to manually set up certs over and over once you install minikube for example.

Whereas maintaining our certificate signer image is something that have less value: customers cannot really use it in production (we do not manage certs rotations, revocation etc...).

Good point! But we still benefit from using cert-gen job while local testing, as well as users who just getting familiar with Che on their minikubes (if any =) ) Should it be enough to get rid of responsibility for TLS in production if we ?:

If we agree on some plan like above - on devworkspace controller side, we won't make cert gen as controller part and like previously we'll generate certificates as deployment prerequisites(now one for webhook server, later +1 for workspaces endpoints), with a non-production makefile rule that will init certificates on minikube;

Devworkspace Operator aspect: The only difference I see here: Che operator can be installed properly without TLS certificates and then fail/expose error during setting up CheCluster CR but DevWorkspace Operator is going to fail during installing without TLS certs, unless we agree that we don't support webhooks on the K8s cluster(which mean creator access only is omitted) (I don't think it's typical for operator to have such any pre-requisites, any example for such operator?) AFAIK OLM could provide certificates for webhook manager for us (not sure it's implemented for k8s) but if we use OLM for that - we have unsecured workspaces-resources(pods) once DevWorkspace Operator is uninstalled (again creator access only); Hopefully, at time we'll provide DevWorkspace Operator on K8s - OLM will solve all the issues for us, like provide certs (if it's not there yet) + clean up CR along with all related resources on opetaror uninstalling (Angel already created an issue on OLM side).

P.S. Sorry for too many words, I just tried to be clear enough and it does not seem to be a simple topic where I can be short at well. cc @amisevsk @JPinkney

l0rd commented 4 years ago

I don't understand. The nginx ingress controller comes with a default TLS certificate. If the secret in the Ingress is omitted then the nginx default one will be used. What's the motivation that pushed us to generate an untrusted certificate when we could use nginx default one? I am talking mainly of minikube and I have an hello world sample here that works on minikube.

l0rd commented 4 years ago

For devworkspace I would disable the webhook if no cert+key is provided and the cluster doesn't provide a cert signing service.

tolusha commented 4 years ago

@mmorhun WDYT ?

sleshchenko commented 4 years ago

The nginx ingress controller comes with a default TLS certificate. If the secret in the Ingress is omitted then the nginx default one will be used.

Thanks for sharing. For some reason I missed that feature at time I worked with self-signed certs. I tested it and as far as I see, despite of the ability for a user to add an exclusion in the browser it's not a valid self-signed certificate for Che: Screenshot_20201008_103928

For compare: Che generated self-signed cert details ![Screenshot_20201008_104519](https://user-images.githubusercontent.com/5887312/95429738-74b6ba80-0953-11eb-9408-44231f301704.png)

I think it means that we won't be able to configure properly Che components'(plugin broker, Che Server, Theia) trust stores, it's going to fail if we propagate default nginx cert CA because of DNS mismatch. The only way we can make Che working with such default cert - configure everything to ignore TLS verification.

P.S.

Nginx generates certificates for its webhook server with https://github.com/jet/kube-webhook-certgen job on minikube ![Screenshot_20201008_110910](https://user-images.githubusercontent.com/5887312/95432038-c6ad0f80-0956-11eb-9157-104efaa8f048.png) ![Screenshot_20201008_122830](https://user-images.githubusercontent.com/5887312/95440586-cbc38c00-0961-11eb-9dd7-96c71c6d8c1f.png) ![Screenshot_20201008_122720](https://user-images.githubusercontent.com/5887312/95440482-aafb3680-0961-11eb-8d74-dcdc4a3148f9.png) I'm not trying to use it as argument to go that way, it's just a fact
mmorhun commented 4 years ago

What's the motivation that pushed us to generate an untrusted certificate when we could use nginx default one? I am talking mainly of minikube and I have an hello world sample here that works on minikube.

@l0rd it was done to have valid self-signed certificate. In case of nginx default certificate we have DNS mismatch. Moreover, some tools require CA certificate, but nginx default one is just self-signed server certificate.

l0rd commented 4 years ago

@sleshchenko @mmorhun you are right about the default ngninx cerficate. I have created a separate issue for a kube cluster with a valid certificate though #18079. And with single host + usage of internal services rather than ingresses we should not need to generate a cert even with that invalid cert (I will log a separate issue for that too).

davidfestal commented 4 years ago

P.S. Depending on cert-manager operator means be blocked to support OpenShift 4.6 until they migrate to the new bundle format ) And it even does not seems to be present on operatorhub.io operatorhub.io/?keyword=manager The referenced operator in the doc returns 404 cert-manager.io/docs/installation/openshift/#installing-with-cert-manager-operator

On a 4.5.4 CRC cluster, see 2 cert-manager proposed in the OperatorHub:

I also tested on a 4.6.0.rc0 cluster started with cluster-bot, and it has the same 2 cert-manager operators available. I could install the certified one without any problem.

As for the fact that is is not on the Operator.io website, let's remember that this site is for the K8S version of operators (not for the Openshift-dedicated nor certified ones).

sleshchenko commented 4 years ago

David thanks for sharing info. It's was mistake to reference OpenShift 4.6. I mainly wanted to solve an issue with a certificate manager for devworkspace operator on kubernetes.

sleshchenko commented 4 years ago

@l0rd We merged the PR since a pretty good job is done there and it solves our issue with multiple deployments support for k8s and OpenShift. Thank you for raising an issue with preferences not to use Che dependent cert gen job. I've created a separate issue to agree on a better alternative https://github.com/devfile/devworkspace-operator/issues/170 But at this time of alpha, it's not critical. Priority is integration with Che.

l0rd commented 4 years ago

thank you @sleshchenko

davidfestal commented 4 years ago

Thanks for the details @sleshchenko. Just want to point to the last documentation about webhooks in Kubebuilder, which is now completely the basis of the new OperatorSDK 1.0: https://book.kubebuilder.io/cronjob-tutorial/running.html and https://book.kubebuilder.io/cronjob-tutorial/cert-manager.html But surely you already know this doc.

amisevsk commented 4 years ago

Total aside, but I'm worried about updating to Operator SDK 1.0 -- I poked a bit at it last week and it seems like it could potentially be a very error-prone process (the upgrade guide is "bootstrap a new project and copy file contents over").