canonical / spark-operator

Spark Operator
Apache License 2.0
1 stars 2 forks source link

spark-k8s service account is not created in kubeflow user's namespaces #15

Closed grobbie closed 1 year ago

grobbie commented 2 years ago

When deploying the operator alongside Charmed Kubeflow, the service account is correctly created in the kubeflow namespace. However it is not created in the user's own namespace. This means sparkapplications do not run when created in the user's namespace.

As I see it, we need to either (a) have an alternative service account already created with the correct permissions in the user's namespace ready to go, (b) automatically create the spark service account in users' namespaces when the spark-k8s operator is deployed (which looks like a problematic can of worms to me) or (c) provide instructions to the user about creating a suitable service account (least desirable option).

Barteus commented 2 years ago

The code for a workaround - copy the spark-k8s service account from kubeflow namespace:

kubectl get sa spark-k8s -n kubeflow -o yaml | sed 's/namespace: kubeflow/namespace: <NEW>/'  | kubectl create -f -

Replace with your namespace name.

ca-scribner commented 2 years ago

This workaround will only work for anyone who has access to the kubeflow namespace, so an admin or similar would have to apply it for general users.

DnPlas commented 2 years ago

Hi @Barteus this is something we'd like to fix by providing the right RBAC to users, which is something the charm code has to figure out. If we copy the sa into each user namespace we are providing access to resources and actions that are not necessarily related to spark, and thus a better workaround would be to create sa with just access to spark objects (those defined in the CRDs). We can leave this bug open until we provide a good fix.

ca-scribner commented 2 years ago

Do we know which of the following is the issue?

  1. that a service account called spark-k8s is required in the user namespace, but that we currently don't have one
  2. that users do not have the proper RBAC to enable them to create instances of the CRs that the Spark Operator deploys (eg: SparkApplication or ScheduledSparkApplication)?
  3. something else?

I ask because, unless the problem is (1), I'm not sure how copying the ServiceAccount alone fixes.

If this is actually an RBAC issue (2), I agree with @dnplas that we can be a bit more specific with our fix (and I'm confused how copying the SA alone would fix anything, but I might be misunderstanding something). If we need RBAC assigned to users, I think the appropriate way to give users this RBAC would be either to: a. (if deploying spark on its own) make role bindings and a service account for these permissions and put them in spark's namespace b. (if deploying alongside kubeflow, enabling all kubeflow users to use spark) create ClusterRoles for with the desired user RBAC and use Kubeflow's role aggregation procedure to get them attached to all users.

To do (b), we create ClusterRoles with the rbac.authorization.kubeflow.org/aggregate-to-kubeflow-* label as we do here. Ideally, these ClusterRoles would be created and managed by the charm operator itself, but I think that the barrier for pod spec charms (including this Spark Operator) is that pod spec does not let us create arbitrary ClusterRoles. As a workaround, we created the kubeflow-roles-operator to (if I recall correctly...) manage roles for any legacy pod spec charms, and as we decommissioned those charms we'd pull those roles back out.

So assuming this is an RBAC issue, I see two possible actions:

i-chvets commented 1 year ago

Spark integration is not supported. A new design is being introduced and integration with Spark will be different. This will need to be revisited and spec'ed out from the beginning. Closing.