eclipse-che / che

Kubernetes based Cloud Development Environments for Enterprise Teams
http://eclipse.org/che
Eclipse Public License 2.0
6.99k stars 1.19k forks source link

Use a reverse proxy to avoid routes/ingress creation at workspace startup #12914

Closed l0rd closed 4 years ago

l0rd commented 5 years ago

Description

In the past 2 years of running Che in production we have seen that OpenShift routes do not always fit our needs (we need to bring-up 3 or more routes in a few seconds for every user workspace). The same applies to Kubernetes Ingresses (still in Beta).

Thus the need to investigate alternatives. One proposal is to pre-create one workspace route/ingress for Che server and link it to a reverse proxy that will route all the workspaces traffic (e.g. re-use JWT proxy, envoy or traefik).

That would allow us to:

If this approach is validated we could divide the work in 4 steps

UPDATE Another important use case is associated to this issue: allow running Che workspaces on OpenShift (where single-host strategy is not available) when using wildcard SSL certificates is not possible.

Implementation

l0rd commented 5 years ago

cc @gorkem

che-bot commented 5 years ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

skabashnyuk commented 5 years ago

/remove-lifecycle stale

benoitf commented 5 years ago

should it go into a backlog ?

l0rd commented 5 years ago

We should put it into a backlog during priotization

che-bot commented 4 years ago

Issues go stale after 180 days of inactivity. lifecycle/stale issues rot after an additional 7 days of inactivity and eventually close.

Mark the issue as fresh with /remove-lifecycle stale in a new comment.

If this issue is safe to close now please do so.

Moderators: Add lifecycle/frozen label to avoid stale mode.

l0rd commented 4 years ago

/remove-lifecycle stale

metlos commented 4 years ago

We've started implementing the performance tests for the individual POCs. Take a look at https://github.com/che-incubator/che-gateway-poc.

metlos commented 4 years ago

In the above mentioned POC repository, we now have 3 POCs implemented:

We're working on haproxy-custom-image which is very similar to nginx-custom-image only with haproxy as the gateway solution. This is to be able to quantify the effect of a custom controller vs externally executed commands.

We're also working on the testsuite. We're developing a number of load test scenarios (https://github.com/che-incubator/che-gateway-poc/tree/master/test#testcases). We have not yet started websocket and cookie handling tests which we are going to start once the haproxy-custom-image poc is implemented.

skabashnyuk commented 4 years ago

@l0rd one of the POCs that we have is CR based traefik https://github.com/skabashnyuk/openshift-traefik. And @metlos raised concern about cluster roles and cluster role bindings https://github.com/skabashnyuk/openshift-traefik/blob/master/001-rbac.yaml. How big this problem for us. Can we afford as a requirement for this feature to have traefik + all necessary roles to be able to read CR. WDYT? CC @benoitf @davidfestal

metlos commented 4 years ago

My main concern there is that a) we're creating a pod with cluster-wide permissions and b) we're creating cluster-wide "generic" CRDs (i.e. Traefik-specified CRDs like IngressRoute) that are only meant for our usage.

So in another words, with the Traefik CRDs we're creating a new routing facility in the whole cluster, not just for our usage.

l0rd commented 4 years ago

For the cluster-wide permission that's ok imo under 2 conditions: it should be optional (i.e. if you do not have enough privileges you can still use Che but you need to stick with multi-host) and it should be deployed via a separate operator (so that Che Operator won't need extra privileges).

For IngressRoute isn't it possible to use Ingress with traefik specific annotations instead?

jfaltermeier commented 4 years ago

Hi, sorry for side-tracking a bit. I have a question about the scope of this ticket. When looking at workspace startup times recently I noticed that a lot of time is taken between the ingress creation and its update. Once the ingress is available the workspace is then scaled up.

Would this ticket help to avoid waiting for the ingress to update because it is pre-created already?

sparkoo commented 4 years ago

@jfaltermeier as a side-effect, yes it will help. We will have only one Ingress for whole Che and will do routing to workspaces ourselves, so we can do it more effectively than cluster.

metlos commented 4 years ago

We have concluded the performance tests. I have created a number of subtasks to guide us through the implementation and referenced them in the description of this epic.

We have not yet chosen the gateway solution though, because there was no clear winner. I have sent out an email to the Che-dev mailing list detailing our current thinking and progress.

metlos commented 4 years ago

Note that we have concluded our testing of the candidate solutions for a reverse proxy. We chose Traefik and will commence the implementation with #17063 - making our Rust-based POC a fully maintained controller written in Go.

To read more about the selection process and reasoning behind the choice of Traefik, please read through https://www.eclipse.org/lists/che-dev/msg03828.html

sparkoo commented 4 years ago

all issues in the scope of this epic closed. related single-host issues will be solved separately.