fabric8-services / fabric8-tenant

Service responsible for provisioning and updating the tenant scoped services
Apache License 2.0
12 stars 29 forks source link

tenant lazy initialization #746

Closed nurali-techie closed 5 years ago

nurali-techie commented 5 years ago

Problem: Currently, when new OSIO user login very_first_time its tenant is initialized and all FIVE namespaces (user, che, jenkins, stage, run) are created in OpenShift for user account. This leads to a problem of resource utilization (rather resource wastage) where User has not done any meaningful work/task and resources for all FIVE namespaces are pre-allocated.

Particularly, in case if user is che_user (che.openshift.io) and interested mainly working with che then also all FIVE namespaces are created while only che-namespace is used.

Note, jenkins_namespace takes significant resource compared to other namespaces.

Solution: We should provide tenant_lazy_initialization support such that namespaces NOT created at the time of first login but namespaces will be created at the time of first usage of given namespace. For ex, jenkins_namespace should be created when first build is triggered.

nurali-techie commented 5 years ago

Current Design:

how to identify which namespace to be created

OpenShift (OS) provides api_endpoints which is used by OSIO to interact with OS. api_endpoints contains namespace name.

For ex, https://api.starter-us-east-2a.openshift.com/oapi/v1/namespaces/john-preview/buildconfigs

Here, namespace_name is john-preview and this is used to get buildconfig for given namespace.

role of oso_proxy:

Ideally all interaction from OSIO to OS should happen through oso_proxy service but that is NOT the case as of now. Below is the list of endpoints call goes through oso_proxy.

namespace: "user" and endpoints: builds and buildconfigs, namespace: "che" and all endpoints: all

role of tenant_service:

Tenant initialization process when user login very_first_time:

role of wit_serivce:

wit service uses tenant_servive mainly for deployment related functionalities. The usage pattern is tricky to track. The usage code is scattered and usage is indirect also.

Below is the list of tenant_service access with namespace (directly or indirectly).

  1. deployments_urlprovider.go newTenantURLProviderFromTenant() { ns = "user" }

  2. /controller/codebase.go getCheNamespace() { ns = "che" }

  3. deployments_access.go whole file, need to check what all ns it uses, mostly "user", "stage", "run"

  4. /controller/user_service.go Show() { tenant_client.ShowTenant() => GET http://tenant/api/tenant (with user_token) }

role of auth_service

Below is the list of usage where tenant_service is used.

  1. /authentication/account/service/tenant.go Init() { tenant_service.SetupTenant() }

role of che_service

Below is the list of usage where relevant endpoints are called from che services.

  1. TenantDataProvider.java link and che.fabric8.user_service.endpoint=https://api.openshift.io/api/user/services link It uses GET /api/user/services endpoint from wit_service which in turns call GET /api/tenant endpoint of tenant_service.

OpenShift calls from Build components:

tenant_service usage

Reference: OpenShift -> Resources -> Config Maps link / Secrets link to find service list.

service namespace endpoint comment
oso_proxy user /builds, /buildconfigs
oso_proxy che all
jenkins_proxy
wit
auth
admin_console
chmouel commented 5 years ago

@khrm can you look at this please ^

nurali-techie commented 5 years ago

Solution

Solution1 - change all services

First, stop creating all FIVE namespace during user login Second, change all service which are using tenant_endpoint GET /api/tenant and pass namespace_param.

This will need required to change lot of service and there are lot of places. Ideally, our main target is che_user for tenant_lazy_init. So this solution is not worth against amount of work it need.

Solution2 - change che service

First, stop creating all FIVE namespace during user login Second, change che service and make sure to pass namespace_param=che when calling tenant_service (directly or indirectly)

This will need to change only che (or oso_proxy) service. If user is che_user then tenant_lazy_init will be achieved. If user is osio_user then there is NO tenant_lazy_init and there is NO change.

MatousJobanek commented 5 years ago

role of tenant_service

Namespace creation

As it was mentioned earlier, to minimize resource consumption we would like to limit the namespace creation. Because the heaviest (from the resource point of view) is jenkins namespace, we would like to avoid creation of this namespace if not really needed - this is the case when the user uses only Che, in other words, accesses che.openshift.io and doesn't use any other feature of openshift.io. In order to do that, tenant service needs to know when it should create only che namespace and not the other ones. To achieve this, it would be great if che service could call either POST or GET request with the parameter ns=che. Just to be precise, when I'm talking about GET request, I'm talking about the case when it calls /api/tenant endpoint (containing user token) and not /api/tenants/{user_id} endpoint that contains SA token (no matter if it is via WIT service or direct requests). If che doesn't call GET /api/tenant and only GET /api/tenants/{user_id}, then it is necessary to add a POST call to /api/tenant?ns=che (with the che value) when a user logs in.

User resource quota

As you can imagine, when we would create only limited number of namespaces (eg only che namespace) then it brings a risk that users could use the remaining space/resources (dedicated for their accounts) in OS cluster and create their own namespace(s) for coin mining or any other usage. We could solve it by:

  1. having dynamic resource quotas for the accounts - by default, a user doesn't have any resource dedicated for their accounts and tenant would dynamically increase the quotas based on the number of created namespaces. But I'm not sure if such dynamic quotas are possible.
  2. dummy namespace - when only che namespace is created, then the rest of the space would eat some dummy namespace where the user wouldn't have any permission to run or edit anything. When we would need to provision the remaining namespaces, this dummy namespace would be removed and replaced by the 4 remaining namespaces (run, stage, jenkins, user).
nurali-techie commented 5 years ago

I will follow up with @ibuziuk for che interaction with osio_service for che_user.

nurali-techie commented 5 years ago

che_user login to https://che.prod-preview.openshift.io sequence

Here is different osio_services api get called when che_user try to login from https://che.prod-preview.openshift.io url:

  1. GET https://auth.prod-preview.openshift.io/api/.well-known/openid-configuration - this returns token_endpoint url
  2. POST https://auth.prod-preview.openshift.io/api/token - call to token_endpoint url and this returns osio_token for user
  3. GET https://api.prod-preview.openshift.io/api/users?filter%5Busername%5D=nvirani-preview - this returns cluster_url
  4. GET https://auth.prod-preview.openshift.io/api/token?for=https%3A%2F%2Fapi.starter-us-east-2a.openshift.com%2F - pass cluster_url and this returns oso_token for user
  5. GET https://api.prod-preview.openshift.io/api/user/services - this returns FIVE namespaces details
  6. GET https://api.prod-preview.openshift.io/api/user - this returns user details

Here are kibana logs links for different osio_services:

Points to be noted,

  1. che_user login first will call wit GET /api/user/services which in turn call tenant GET /api/tenant
  2. later it call auth GET /api/user which in turns call (in parallel) tenant POST /api/tenant

@ibuziuk I checked kibana_logs and deduce above, request you to kindly confirm above steps :-)

ibuziuk commented 5 years ago

@nurali-techie sorry, I got a bit lost in this issue. What exactly are you planning to do with che calls of api/user/services etc ?

nurali-techie commented 5 years ago

@ibuziuk to support tenant_lazy_init, we want one new param while making api call. The new param will be namespace and particularly che should pass the param namespace=che. This will allow us to support tenant_lazy_init where we only create namespace for che and other namespaces (user, jenkins, stage, run) will not be created. If we support this, if user is only using che then only che namespace will be created. Rest I can explain you over MM.

ibuziuk commented 5 years ago

@nurali-techie I need to check with @davidfestal who implemented e2e registration / provisioning flow for che.openshift.io. David, could you please comment how easy do you think it would be to adjust the provisioning flow for che.openshift.io to support namespace=che parameter ?

ibuziuk commented 5 years ago

@nurali-techie also please take into account that our sprint 162 is already planned, so if there is smth. required to be implemented on our end we need to be notified in advance, so that we could prioritize it with @l0rd

nurali-techie commented 5 years ago

@davidfestal for now, please just cross check the che login steps mentioned in comment are correct or not and if anything is missing then let us know. You no need to check about passing namespace=che param atm.

I have used my user (nvirani-preview) for investigation and this user created in past. Now, if there is new che_user created and if its login very_first_time in that case above mentioned step is enough or there are few more steps.

Note: We are mainly bothering about osio api which che uses during che_user login.

nurali-techie commented 5 years ago

@alexeykazakov @MatousJobanek ultimately we need to introduce namespace param with tenant_service api. tenant_service is not exposed externally (no route) and used by other service (wit, auth, oso_proxy) so we also need to introduce namespace_param for one of the external service.

As per current finding, it seems that we need to introduce namespace param for GET /api/user/services in wit_service which in turns propagate namespace_param to GET /api/tenant in tenant_service. Also, POST /api/tenant in tenant_service should stop creating all FIVE namespaces.

At the same time, it looks weird to pass namespace_param to /api/user/services the param not fitting well with api.

davidfestal commented 5 years ago

@nurali-techie The place where the GET /api/user/services request is done is here: https://github.com/redhat-developer/rh-che/blob/master/plugins/fabric8-end2end-flow/src/main/resources/end2end/files/RhCheKeycloak.js#L328 By calling this endpoint, we expect the call to indirectly trigger the namespace setup. Afaik there would be no problem at all adding a parameter here to setup only the Che namespace.

We would add the parameter also here: https://github.com/redhat-developer/rh-che/blob/master/plugins/fabric8-end2end-flow/src/main/resources/end2end/files/RhCheKeycloak.js#L345 which is where we poll if the namespaces have been setup or not.

davidfestal commented 5 years ago

later it call auth GET /api/user which in turns call (in parallel) tenant POST /api/tenant

It only calls GET /api/user when the namespaces have been setup (since the user/services return a successful answer at least once). If necessary I could also add a parameter here.

alexeykazakov commented 5 years ago

also please take into account that our sprint 162 is already planned, so if there is smth. required to be implemented on our end we need to be notified in advance, so that we could prioritize it

@ibuziuk that lazy initialization is purely for che.openshift.io benefit and is coming to us as a requirement from that direction ;) Not the other way around.

alexeykazakov commented 5 years ago

Also we should make sure we do not provision all missing/not-initialized-yet namespaces during tenant update.

ibuziuk commented 5 years ago

@alexeykazakov I do understand that it is an important internal resource usage enhancement / optimization that will technically allow to provision more users for che.openshift.io. Unfortunately, I became aware of this issue only yesterday :-( We are always glad to help, but our sprint is already planned and it would be really problematic to add yet another task in case smth. is expected to be implemented on our end (as I understand it is not the case for this task). Please, let me know before the next planning, if some cross-team effort is expected for this or any other task, so that we could prioritize and plan our sprint accordingly.

alexeykazakov commented 5 years ago

yes, we can change this logic, but at least user namespace is expected to be there for Login to OSO installer, otherwise oc functionality will not work inside the workspace correctly

@ibuziuk can you elaborate on that please? Are you saying that che is using the user namespace too (and not che-* one only)? If so, what does she use it for?

ibuziuk commented 5 years ago

@alexeykazakov user namespace is used by Login to OSO installer which allows to use oc from workspace terminal against the user namespace.

image

MatousJobanek commented 5 years ago

Ok, it seems that solving the lazy initialization (parametrization) only for -che namespace (users accessed via che.openshift.io) brings more complications than benefits. As a result of this discussion, I would propose to parametrize the lazy initialization only for -jenkins namespace as the first step.

Current flow:

  1. user logs is
  2. WIT? sends POST call to tenant service to provision namespaces
  3. all five namespaces are created

Proposed flow

  1. user logs in
  2. WIT? sends POST call to tenant service to provision namespaces (without any parameter)
  3. all five namespaces are created, but the -jenkins namespace is empty (no deployment nor PVC)
  4. user creates a space/imports codebase
  5. WIT? sends either POST or GET call to tenant service to provision -jenkins namespace (with parameter ns=jenkins)
  6. tenant service adds necessary objects to -jenkins namespace

In other words, tenant service deployes -jenkins bits only when needed - when there is either POST or GET call containing ns=jenkins parameter. With this we would solve two things at the same time:

alexeykazakov commented 5 years ago

Current flow:

  1. user logs is
  2. WIT? sends POST call to tenant service to provision namespaces

It's Auth. Login is the first time when /api/user endpoint is called which also calls the tenant service - https://github.com/fabric8-services/fabric8-auth/blob/3a44ba7ddd1ebe96cae360c1cecfb7f7264d69d6/controller/user.go#L69

nurali-techie commented 5 years ago

@MatousJobanek @alexeykazakov I have spent some time finding the best place to call create_jenkins_namespace. I guess calling during create_space code_link would be the safe place as this is the first thing osio_user will do.

alexeykazakov commented 5 years ago

Wouldn’t create codebase endpoint a better place? https://github.com/fabric8-services/fabric8-wit/blob/master/controller/space_codebases.go#L27 It’s possible to create a space without creating an app/codebase.

nurali-techie commented 5 years ago

@alexeykazakov yes, it's also an option to call create_jenkins_namespace during create_codebase and for that we need to check with launcher team. Here is what I found in launcher_code.

Both "Create a new codebase" and "Import an existing codebase" wizard from osio is handled by laucher_backend.

Create a new codebase:

  1. POST https://forge.api.prod-preview.openshift.io/api/osio/launch
  2. Call will be handled by OsioLaunchMissionControl#launch() code_link
  3. It first call triggerBuild code_link
  4. It then call createCodebase code_link
  5. The call will come to wit_service space_codebases.go code_link

Here, it seems that step-3 required presence of jenkins_namespace. It means we need to change launcher_code and need to first call createCodebase and then triggerBuild.

Import an existing codebase: The call seq is similar as "Create a new codebase" but "OsioImportMissionControl#launch()" is called.

  1. POST https://forge.api.prod-preview.openshift.io/api/osio/import
  2. Call will be handled by OsioImportMissionControl#launch() code_link
  3. It first call triggerBuild code_link
  4. It then call createCodebase code_link
  5. The call will come to wit_service space_codebases.go code_link

So we need to check with lauch_team if the call seq "required to change" and "can be changed" at both place -> first call createCodebase and then triggerBuild and we need to make change in wit_service to call create_jenkins_namespace from create_codebase.

@alexeykazakov please add launcher team contact in case we want to go with this option.

alexeykazakov commented 5 years ago

Yeah right. Deploying Jenkins during codebase creation in WIT seems to be to late. This is the launcher code which does that - https://github.com/fabric8-launcher/launcher-backend/blob/9f13fb0fe092d8a3d8dbcefae5191835fb9766a2/addons/osio-addon/src/main/java/io/fabric8/launcher/osio/OsioLaunchMissionControl.java#L71-L102

nurali-techie commented 5 years ago

@MatousJobanek @alexeykazakov we have delete jenkins_namespace for one of the user and then try to login osio and che to see things are normal with jenkins_namespace not there. The testing is green. Things works normally when jenkins_namespace not preset.

MatousJobanek commented 5 years ago

we have delete jenkins_namespace for one of the user and then try to login osio and che to see things are normal with jenkins_namespace not there. The testing is green. Things works normally when jenkins_namespace not preset.

just to be precise - the namespace -jenkins wasn't deleted, it was just cleaned by calling oc delete all,pvc,cm --all. The actual namespace stayed present.

nurali-techie commented 5 years ago

Closing as no longer needed.