apache / kyuubi

Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
https://kyuubi.apache.org/
Apache License 2.0
2.11k stars 914 forks source link

[Improvement] Launching k8s engine pods with respective users #6784

Open Madhukar525722 opened 3 weeks ago

Madhukar525722 commented 3 weeks ago

Code of Conduct

Search before asking

What would you like to be improved?

Spark submit Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https:///api/v1/namespaces/genai/pods. Message: Forbidden! User doesn't have permission. pods is forbidden: User "madlnu" cannot create resource "pods" in API group "" in the namespace "genai".

Kyuubi engine launch in share level USER Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https:///api/v1/namespaces/genai/pods. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods is forbidden: User "system:serviceaccount:scaas:spark" cannot create resource "pods" in API group "" in the namespace "genai".

When an USER engine is launched in k8s cluster, it is taking the user with which kyuubi server is running rather than actual users.

Configurations: kyuubi.authentication=KERBEROS kyuubi.spnego.keytab=spnego.keytab kyuubi.spnego.principal=spnego@DOMAIN.COM kyuubi.kinit.principal=hive@DOMAIN.COM kyuubi.kinit.keytab=hive.keytab spark.kubernetes.namespace=genai kyuubi.kubernetes.master.address=k8s://https:// spark.master=k8s://https:// kyuubi.kubernetes.namespace=scaas spark.submit.deployMode=cluster spark.kubernetes.authenticate.serviceAccountName=spark spark.kubernetes.authenticate.driver.serviceAccountName=spark

How should we improve?

Expectation is user authentication should happen, while launching the engine pods.

Are you willing to submit PR?

pan3793 commented 3 weeks ago

A good question, in the Spark on YARN case, we leverage the Hadoop user impersonate mechanism to avoid managing all users' keytab, how do you manage the credentials of K8s for all users?

Madhukar525722 commented 3 weeks ago

Hi @pan3793, One of the thing that I came across is webhook admission controller, this can be used to achieve impersonation in k8s. We can intercept the request to k8s API and change the user. Is such implementations going to be fine?

pan3793 commented 3 weeks ago

I am not familiar with this area, could you provide some docs/blogs to describe this solution, and is it possible to demonstrate this solution in a minikube (our CI runs on minikube)

Madhukar525722 commented 3 weeks ago

Hi @pan3793 , Here are some of the context

  1. Official documentation - https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/
  2. Medium - https://blog.devgenius.io/k8s-for-data-engineers-admission-controller-371758f90107
  3. Applications like apache yunikorn uses admission controller for their use case Ultimately, we can achieve this by mutating the owner by real user.

Sure, I will try to create a demo on a minikube cluster.