eclipse-che / che

Kubernetes based Cloud Development Environments for Enterprise Teams
http://eclipse.org/che
Eclipse Public License 2.0
6.95k stars 1.19k forks source link

K3s support #12973

Open sr229 opened 5 years ago

sr229 commented 5 years ago

Description

K3s is a Kubernetes distrbution intended for IoT, edge and those who wants a minimal Kubernetes experience (ala-HyperKube without the pain).

However, K3s removes the following:

With this in mind, we need to have a seperate manifest for deploying in K3s as there are many resources we use that are not in the v1 API stability.

che-bot commented 4 years ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

amisevsk commented 4 years ago

FWIW I don't believe we do anything outside the v1 API anymore, since deployments moved out of v1beta2

sr229 commented 4 years ago

@amisevsk I believe we use Ingresses which isn't a v1 API. This'll beak K3s

amisevsk commented 4 years ago

@sr229 You are right, I missed that one.

sr229 commented 4 years ago

I think at this rate we can have the same chart but we have to revise the Ingress resources to use a LoadBalancer Resource instead - I'm not exactly sure how that's parity on what we have

petzsch commented 4 years ago

It is indeed possible to setup Che on k3s. You just need to make sure that you don't deploy traefik when setting up the manager/server: curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --no-deploy traefik" sh

Otherwise I pretty much followed the Azure instructions here: http://www.eclipse.org/che/docs/che-7/installing-eclipse-che-on-microsoft-azure/

So I deployed the ingress-nginx as described there.

One problem was the cert-manager for updating the domain. I had the domain on AzureDNS, but I'm pretty sure the free option (CloudFlare DNS) would have worked just as well. When using a more recent version of cert-manager (0.14.2 using this url with kubectl: https://github.com/jetstack/cert-manager/releases/download/v0.14.2/cert-manager.yaml) some of the yml changed. Here are the examples I used:

cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: che-certificate-issuer
spec:
  acme:
    email: markus@example.com
    privateKeySecretRef:
      name: letsencrypt
    server: https://acme-v02.api.letsencrypt.org/directory
    solvers:
    - dns01:
        azuredns:
          # Service principal clientId (It's the value printed in the previous lines: echo ID=$AZURE_CERT_MANAGER_SP_APP_ID )
          clientID: <client-id>
          # A secretKeyRef to a service principal ClientSecret (password)
          # ref: https://docs.microsoft.com/en-us/azure/container-service/kubernetes/container-service-kubernetes-service-principal
          clientSecretSecretRef:
            name: azuredns-config
            key: CLIENT_SECRET
          # Azure subscription Id that can be obtained with command:
          # $ az account show  | jq -r '.id'
          subscriptionID: <subscription-id>
          # Azure AD tenant Id that can be obtained with command:
          # $ az account show  | jq -r '.tenantId'
          tenantID: <tenant-id>
          resourceGroupName: <ressource-group-you-put-your-ide-domain-into>
          # The DNS Zone to use
          hostedZoneName: ide.example.com
EOF

and for the cert:

cat <<EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1alpha2
kind: Certificate
metadata:
 name: che-tls
 namespace: che
spec:
 secretName: che-tls
 issuerRef:
   name: che-certificate-issuer
   kind: ClusterIssuer
 dnsNames:
   - '*.ide.example.com'
EOF

Also for chectl to be able to deploy, I had to seperate the ca.crt from the tls.crt which seems to be a known bug:

kubectl patch secret \
  -n che che-tls \
  -p="{\"data\":{\"ca.crt\": \"$(kubectl get secret \
  -n che che-tls \
  -o json -o=jsonpath="{.data.tls\.crt}" \
  | base64 -d | awk 'f;/-----END CERTIFICATE-----/{f=1}' - | base64 -w 0)\"}}"

I will write together a more complete howto next weekend. I hope for now this helps anyone searching Che and k3s and finding this.

amisevsk commented 4 years ago

Thanks for trying this out @petzsch, it's great you were able to get up and running! If you have time, it would be great to contribute something to the Che documentation.

I haven't played with k3s so I'm kind of useless here.

sr229 commented 4 years ago

I can review the document when it gets PRed. We would need a K3s guide.

petzsch commented 4 years ago

Still waiting for my employer to provide me with 2 new VPS for setting Che+k3s up again on a clean install.

While doing so, I'll document my steps taken. And create a PR.

At the moment I'm struggeling to get the che-doc repository loaded in Che. Maybe you've seen the issue that I created for that?

https://github.com/eclipse/che/issues/16624

jekyll starts when the container starts, binding to the default port and serving from the wrong working dir. I believe that to be an issue with the Dockerfile, but maybe I am doing it wrong.

mash-graz commented 4 years ago

i also use k3s for must of my deployments, because it's very user friendly and resource saving.

one of the nice features of k3s may be seen in the fact, that you can easily prepare a setup by some static configuration files in a manifests directory, and it will bring up the whole installation on startup. this also works with custom helm charts.

i personally would prefer this kind of static setup instead of manual invocation of chectl.

as already mentioned by others, i also do not use the default traefik ingress included in k3s but replace it by treafik2. this works very well in practice and comes with very comfortable automatized letsencrypt handling.

if you want see an example setup for this kind of deployment, take a look at: https://gitlab.com/mur-at-public/kube , although it's unfortunately documented in german language.

i'll take a look, if i can deploy che in this kind of environment as well.

metlos commented 4 years ago

Note that theoretically, we could be able to support traefik or traefik2 if they support path rewriting in their ingress config. See https://github.com/eclipse/che/blob/master/assembly/assembly-wsmaster-war/src/main/webapp/WEB-INF/classes/che/che.properties#L391 and https://github.com/eclipse/che/blob/master/assembly/assembly-wsmaster-war/src/main/webapp/WEB-INF/classes/che/che.properties#L402.

If we can come up with a set of ingress annotations for traefik that would be equivalent to the default nginx ones, everything should theoretically work.

mash-graz commented 4 years ago

yes, that's possible, but you'll loose a lot of useful capabilities in this case (e.g. all the middleware stuff, raw TCP/UDP port forwarding etc.). there are good reasons, why most of the more powerful kubernetes ingress solutions in the meanwhile have changed their behavior in similar manner as istio to provide additional features.

btw: another import feature of k3s, which i forgot to mention in my previous post, is the local storage provider, which could be very useful in minimalist che setups.

right now i'm still fighting with the fact, that all present documentation of che doesn't describe a simple setup with native kubernetes manifests or helm charts anymore. chectl may be a very handy solution for many users, but it makes it rather hard to automatize tasks in a kubernetes like manner.

does anybody know, if this tool is at least able to output it's commands as manifest sequences on stdout? that's the compromise, which was chosen e.g. in the linkerd2 config tool and similar software as a workaround to support both kinds of utilization. i couldn't find such an option in the chectl documentation until now.

petzsch commented 4 years ago

I might have the wrong perspective about the local path storage provider: But it gave me quite a bit of headache with it's placement constraints (I think that's what they where called) when I tried to scale down my cluster. I had volumes attached to all of my nodes and couldn't find a way to migrate them to other worker nodes.

That's why I wanted to have a look at longhorn and give that a try.

Personally chectl hasn't been an issue with k3s. Though I would also prefer a more transparent workflow like with helm charts. Assuming there were reasons for this decision.

mash-graz commented 4 years ago

I might have the wrong perspective about the local path storage provider: ...

for real multinode clusters you will usually choose and setup the most optimal solution with care (ceph etc.), but k3s is in most cases used in rather simple usage scenarios, just like minikube.

i only mentioned this detail, because even in this kind of minimalist utilization it will make an observable performance difference (e.g. on compiling code), on which type of persistent volume you proceed your jobs. that's why it's often useful to utilize this implementation specific capabilities, if they are available even on this very simple setups.

Personally chectl hasn't been an issue with k3s. Though I would also prefer a more transparent workflow like with helm charts. Assuming there were reasons for this decision.

yes -- this may be the case, nevertheless i personally also esteem transparency and comprehensible control.

sr229 commented 4 years ago

Some updates, I think we can use the Traefik ingress which is in the extensions/v1beta1 namespace, which should work for most of the people's needs.

sr229 commented 4 years ago

Otherwise I think Che should be ready for primetime for this, will give it a check later.

che-bot commented 3 years ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

che-bot commented 2 years ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

amisevsk commented 2 years ago

/remove-lifecycle stale

che-bot commented 1 year ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

DecDuck commented 1 year ago

Did anything come of this? I don't see K3s documentation here.

djdanielsson commented 1 year ago

I am interested in this as well but I do not see a PR connected to it closing so I think it auto closed due to no movement.

amisevsk commented 1 year ago

/remove-lifecycle stale

amisevsk commented 1 year ago

I'm not sure the status of this issue; I personally don't have time to set up a k3s cluster and start testing. A related issue that may be of interest is https://github.com/devfile/devworkspace-operator/issues/1068, which would be a component of enabling support for traefik as the ingress controller.

gbonnefille commented 1 year ago

I'm not sure the status of this issue; I personally don't have time to set up a k3s cluster and start testing. A related issue that may be of interest is devfile/devworkspace-operator#1068, which would be a component of enabling support for traefik as the ingress controller.

I'm targeting such use case. Currently, I only run a devworkspace-operator in a k3s. I encounter some issues around the hardcoded nginx Ingress Controller reference.

One workaround is to declare a nginx IngressClass routing to traefik ingress controller.

Then, we faced to a concurrent access to the Ingress resource: traefik tries to update the Ingress resource and DWO reconcile to its version. I'm not sure, but I think it is related to the ingressClassName field: as not set, traefik tries to set its own name.

So, I propose this PR: https://github.com/devfile/devworkspace-operator/pull/1143

gbonnefille commented 1 year ago

I made some tests and I can state that the https://github.com/devfile/devworkspace-operator/pull/1143 fixes the ability for traefik to handle the Kind=Ingress generated by DWO.

simonjcarr commented 1 year ago

@amisevsk

I personally don't have time to set up a k3s cluster and start testing.

curl -sfL https://get.k3s.io | sh -
gbonnefille commented 1 year ago

@amisevsk

I personally don't have time to set up a k3s cluster and start testing.

curl -sfL https://get.k3s.io | sh -

I think a simpler solution can be to use k3d (https://k3d.io/) in order to deploy a K3S inside docker, without any impact on your host.

amisevsk commented 1 year ago

Half the battle is setting up the cluster, the other half is finding time to run and test DWO on it :)

It's still in my TODO list, but in the mean time reporting any issues or submitting PRs is more than welcome