GoogleCloudPlatform / kubeflow-distribution

Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos
Apache License 2.0
80 stars 63 forks source link

Pull&Push Container Image From Artifact Registry #430

Open oguzhanoyan opened 1 year ago

oguzhanoyan commented 1 year ago

As you know, the container registry will be deprecated and no longer accessible. Our need is to pull & push images from the ubuntu_containerd Kubeflow node. In order to pull and push images to artifact registry in node, followed;

  1. Node-pool should have sufficient oauth scopes. which are storage-rw & cloud-platform
  2. Node-pool service account which is kubeflow-vm should have sufficient permission.
  3. And node should be created token.

So; I created a new node-pool which include sufficient oauth scope with gcloud

gcloud beta container --project "xxx" node-pools create "pool-1" --cluster "xxxx" --region "xxx" --node-version "1.25.9-gke.2300" --machine-type "e2-medium" --image-type "UBUNTU_CONTAINERD" --disk-type "pd-balanced" --disk-size "100" --metadata disable-legacy-endpoints=true --service-account "xxxx-vm@xxxx.iam.gserviceaccount.com" --spot --num-nodes "1" --enable-autoupgrade --enable-autorepair --scopes=storage-rw,cloud-platform --max-surge-upgrade 1 --max-unavailable-upgrade 0

And give artifact registry writer permission to "xxxx-vm@xxxx.iam.gserviceaccount.com" service account, then I managed to push the image from the node. So whoever needs don't know but adding these scope and permission to installation yamls should solve the problem. My proposal is;

  1. Adding https://www.googleapis.com/auth/devstorage.read_write oauth scope to containercluster kind instead of readonly

    apiVersion: container.cnrm.cloud.google.com/v1beta1
    kind: ContainerCluster
    metadata:
    labels:
    mesh_id: "proj-PROJECT_NUMBER" # kpt-set: proj-${gcloud.project.projectNumber}
    name: KUBEFLOW-NAME # kpt-set: ${name}
    spec:
    initialNodeCount: 2
    addonsConfig:
    httpLoadBalancing:
      disabled: false
    clusterAutoscaling:
    enabled: true
    autoProvisioningDefaults:
      oauthScopes:
      - https://www.googleapis.com/auth/logging.write
      - https://www.googleapis.com/auth/monitoring
      # - https://www.googleapis.com/auth/devstorage.read_only
      - https://www.googleapis.com/auth/devstorage.read_write
      serviceAccountRef:
        name: KUBEFLOW-NAME-vm # kpt-set: ${name}-vm
    resourceLimits:
    - resourceType: cpu
      maximum: 128
    - resourceType: memory
      maximum: 2000
    - resourceType: nvidia-tesla-k80
      maximum: 16
    releaseChannel:
    # Per https://github.com/GoogleCloudPlatform/k8s-config-connector/issues/194
    # use upper case for the channels
    channel: STABLE
    # Master version controls the kubernetes API version.
    # Future upgrade of master version might break Kubeflow deployment.
    # At the time of writing, STABLE release channel is using 1.17.
    # knative requires Kubernetes version to be 1.18+. Therefore we set 
    # master version below. We need to make sure reviewing this for future release.
    # Autopilot mode will enforce automatic node upgrade.
    # minMasterVersion: '1.18'
    location: LOCATION # kpt-set: ${location}
    workloadIdentityConfig:
    identityNamespace: PROJECT.svc.id.goog # kpt-set: ${gcloud.core.project}.svc.id.goog
    loggingService: logging.googleapis.com/kubernetes
    monitoringService: monitoring.googleapis.com/kubernetes
    nodeConfig:
    machineType: e2-standard-8
    metadata:
      disable-legacy-endpoints: "true"
    oauthScopes:
    - https://www.googleapis.com/auth/logging.write
    - https://www.googleapis.com/auth/monitoring
      # - https://www.googleapis.com/auth/devstorage.read_only
    - https://www.googleapis.com/auth/devstorage.read_write
    serviceAccountRef:
      name: KUBEFLOW-NAME-vm # kpt-set: ${name}-vm
    workloadMetadataConfig:
      nodeMetadata: GKE_METADATA_SERVER
  2. Adding artifact registry writer permission to "xxxx-vm@xxxx.iam.gserviceaccount.com" service account

apiVersion: iam.cnrm.cloud.google.com/v1beta1
kind: IAMPolicyMember
metadata:
  name: KUBEFLOW-NAME-vm-policy-storage # kpt-set: ${name}-vm-policy-storage
spec:
  member: serviceAccount:KUBEFLOW-NAME-vm@PROJECT.iam.gserviceaccount.com # kpt-set: serviceAccount:${name}-vm@${gcloud.core.project}.iam.gserviceaccount.com
  role: roles/artifactregistry.createOnPushRepoAdmin
  resourceRef:
    apiVersion: resourcemanager.cnrm.cloud.google.com/v1beta1
    kind: Project
    external: projects/PROJECT # kpt-set: projects/${gcloud.core.project}
---
apiVersion: iam.cnrm.cloud.google.com/v1beta1
kind: IAMPolicyMember
metadata:
  name: KUBEFLOW-NAME-vm-policy-storage # kpt-set: ${name}-vm-policy-storage
spec:
  member: serviceAccount:KUBEFLOW-NAME-vm@PROJECT.iam.gserviceaccount.com # kpt-set: serviceAccount:${name}-vm@${gcloud.core.project}.iam.gserviceaccount.com
  role: roles/artifactregistry.createOnPushWriter
  resourceRef:
    apiVersion: resourcemanager.cnrm.cloud.google.com/v1beta1
    kind: Project
    external: projects/PROJECT # kpt-set: projects/${gcloud.core.project}