GoogleCloudPlatform / kubeflow-distribution

Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos
Apache License 2.0
78 stars 63 forks source link

Default workload identity bindings for deployments that need GCP permissions #61

Open Bobgy opened 4 years ago

Bobgy commented 4 years ago

I've found quite a few deployments don't work properly due to lack of GCP permissions, including:

/cc @jlewi

I think we should bind workload identity with these KSAs by default.

Proposal

Add workload identity binding for

default-editor will be automatically bound to user GSA when profiles manager works properly.

@jlewi I can implement this, thoughts?

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
kind/bug 0.82

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

Bobgy commented 4 years ago

/assign @Bobgy

jlewi commented 4 years ago

@Bobgy Isn't the profiles-manager already bound to a GSA?

jlewi commented 4 years ago

I looked at one of the auto deployments

It looks like profiles manager is using KSA: serviceAccount: profiles-controller-service-account

Which doesn't have a GSA

kubectl -n kubeflow get serviceAccount -o yaml profiles-controller-service-account
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"labels":{"kustomize.component":"profiles"},"name":"profiles-controller-service-account","namespace":"kubeflow"}}
  creationTimestamp: "2020-06-23T11:45:21Z"
  labels:
    kustomize.component: profiles
  name: profiles-controller-service-account
  namespace: kubeflow
  resourceVersion: "3510"
  selfLink: /api/v1/namespaces/kubeflow/serviceaccounts/profiles-controller-service-account
  uid: 04de24e6-b547-11ea-8696-42010a800149
secrets:
- name: profiles-controller-service-account-token-bvcxp

I think this is a bug. If the profiles manager isn't bound to any GSA it won't be able to attach GSA's to KSAs in newly provisioned namespaces.

Looking at a v1 deployment it looks like it is bound to the admin account

kubectl --context=kf-v1-0210 -n kubeflow  get serviceaccount -o yaml profiles-controller-service-account
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    iam.gke.io/gcp-service-account: kf-v1-0210-admin@jlewi-dev.iam.gserviceaccount.com
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"ServiceAccount","metadata":{"annotations":{},"labels":{"app.kubernetes.io/component":"profiles","app.kubernetes.io/instance":"profiles-v1.0.0","app.kubernetes.io/managed-by":"kfctl","app.kubernetes.io/name":"profiles","app.kubernetes.io/part-of":"kubeflow","app.kubernetes.io/version":"v1.0.0","kustomize.component":"profiles"},"name":"profiles-controller-service-account","namespace":"kubeflow"}}
  creationTimestamp: "2020-02-10T19:12:36Z"
  labels:
    app.kubernetes.io/component: profiles
    app.kubernetes.io/instance: profiles-v1.0.0
    app.kubernetes.io/managed-by: kfctl
    app.kubernetes.io/name: profiles
    app.kubernetes.io/part-of: kubeflow
    app.kubernetes.io/version: v1.0.0
    kustomize.component: profiles
  name: profiles-controller-service-account
  namespace: kubeflow
  resourceVersion: "5691"
  selfLink: /api/v1/namespaces/kubeflow/serviceaccounts/profiles-controller-service-account
  uid: 4c172377-4c39-11ea-86b4-42010a8e01a3
secrets:
- name: profiles-controller-service-account-token-rxm4l
Bobgy commented 4 years ago

that's true, It was originaly bound by kfctl

We need to bind It by kcc now

jlewi commented 4 years ago

Thanks @Bobgy that makes sense; thanks for reporting and fixing this

jlewi commented 4 years ago

@Bobgy Lets not close this issue until

  1. kubeflow/manifests#1317 has been cherry-picked onto the 1.1 branch
  2. We have verified in the auto-deployments that the issue is fixed.
jlewi commented 4 years ago

@Bobgy Are we going to have to apply the same change as #64 to the auto-deploy script.

https://github.com/kubeflow/testing/blob/637d1cd5fe33d03ee5646380d960fabe8a230d0a/py/kubeflow/testing/create_kf_from_gcp_blueprint.py#L130

issue-label-bot[bot] commented 4 years ago

Issue-Label Bot is automatically applying the labels:

Label Probability
area/kfctl 0.64

Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.

Bobgy commented 4 years ago

The most recent deploy "kf-vbp-0629-571" runs pipelines successfully, but workload identity binding is still not properly setup.

$ k get sa pipeline-runner -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  annotations:
    iam.gke.io/gcp-service-account: name-user@project-id.iam.gserviceaccount.com
  name: pipeline-runner
  namespace: kubeflow
  ...
Bobgy commented 4 years ago

I'm guessing the change https://github.com/kubeflow/testing/pull/705 didn't make it into the auto deploy. Is there a way we can trigger a manual deploy?

Bobgy commented 4 years ago

Looked at the tekton task https://kf-ci-v1.endpoints.kubeflow-ci.cloud.goog/tekton/#/namespaces/auto-deploy/pipelineruns/deploy-kf-master-p5sb2, it's using

params:
  - name: revision
    value: gcp_blueprints
  - name: url
    value: 'https://github.com/jlewi/testing.git'
type: git

@jlewi Can you fix this?

jlewi commented 4 years ago

This is currently coming from here: https://github.com/kubeflow/testing/blob/88c153fb7ab84ea08c5ea7cf21fbf0a30b7814d3/test-infra/auto-deploy/manifest/config/pipeline-run-deploy-blueprint-master.yaml#L35

jlewi commented 4 years ago

The current auto-deployment was deployed before kubeflow/testing#707 was merged.

gcp-blueprint-master    kf-vbp-0629-3f5 8:28:38.534253  deploy-kf-master-rpgt4  7cfd7bc     gcloud --project=kubeflow-ci-deployment container clusters get-credentials --region=us-central1 kf-vbp-0629-3f5

The next auto-deployment will hopefully have the fix.

Bobgy commented 4 years ago

Thanks for the heads up! I'll wait for the next one then

Bobgy commented 4 years ago

This is working properly now. I verified in https://kf-vbp-0630-8f5.endpoints.kubeflow-ci-deployment.cloud.goog/_/pipeline/#/runs/details/c7af3e03-87ab-4a6b-b30c-20d8b9a5a8ea, the tfx taxi sample ran successfully.

Bobgy commented 4 years ago

I found a problem, default-editor KSA in user namespaces didn't get correct workload identity binding.

Root cause: I used IAMPolicy for workload identity bindings for kf-user GSA. However, in multi user mode, profile controller also tries to add additional workload identity bindings to kf-user GSA. Then, profile controller and KCC controller will periodically overwrite each other's IAM policies.

We should use IAMPolicyMember instead.

jlewi commented 4 years ago

@Bobgy Thanks; we will need to cherry-pick this into the 1.0 branch of manifests.

Bobgy commented 4 years ago

We should verify permissions of notebooks whether this is good in auto deploys before closing this.