GoogleCloudPlatform / kubeflow-distribution

Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos
Apache License 2.0
78 stars 63 forks source link

Explore Anthos Bare Metal integration https://cloud.google.com/anthos/clusters/docs/bare-metal/1.8/try/gce-vms-tf #329

Open zijianjoy opened 2 years ago

zijianjoy commented 2 years ago
sudo su

 sudo ./run_initialization_checks.sh && sudo bmctl create config -c anthos-gce-cluster && sudo cp ./anthos-gce-cluster.yaml bmctl-workspace/anthos-gce-cluster && sudo bmctl create cluster -c anthos-gce-cluster
zijianjoy commented 2 years ago

Error:

create kind cluster failed: error validating cluster config: 3 errors occurred:
        * GKERegister check failed: 2 errors occurred:
        * Get "https://gkehub.googleapis.com/v1beta1/projects/jamxl-kfp-dev/locations/global/memberships/anthos-gce-cluster": oauth2: cannot fetch token: 400 Bad Request
Response: {"error":"invalid_grant","error_description":"Invalid JWT Signature."}
        * invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts

        * ClusterOperations check failed: invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts
        * GCR pull permission for bucket: artifacts.anthos-baremetal-release.appspot.com failed: invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts

[2021-10-11 04:26:58+0000] Deleting bootstrap cluster... OK
[2021-10-11 04:26:58+0000] Error creating cluster: create kind cluster failed: error validating cluster config: 3 errors occurred:
        * GKERegister check failed: 2 errors occurred:
        * Get "https://gkehub.googleapis.com/v1beta1/projects/jamxl-kfp-dev/locations/global/memberships/anthos-gce-cluster": oauth2: cannot fetch token: 400 Bad Request
Response: {"error":"invalid_grant","error_description":"Invalid JWT Signature."}
        * invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts

        * ClusterOperations check failed: invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts
        * GCR pull permission for bucket: artifacts.anthos-baremetal-release.appspot.com failed: invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts

Error: create kind cluster failed: error validating cluster config: 3 errors occurred:
        * GKERegister check failed: 2 errors occurred:
        * Get "https://gkehub.googleapis.com/v1beta1/projects/jamxl-kfp-dev/locations/global/memberships/anthos-gce-cluster": oauth2: cannot fetch token: 400 Bad Request
Response: {"error":"invalid_grant","error_description":"Invalid JWT Signature."}
        * invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts

        * ClusterOperations check failed: invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts
        * GCR pull permission for bucket: artifacts.anthos-baremetal-release.appspot.com failed: invalid/expired service account key for "baremetal-gcr@jamxl-kfp-dev.iam.gserviceaccount.com": please check service account at Google Cloud Console -> IAM & Admin -> Service Accounts

Usage:
  bmctl create cluster [flags]

Flags:
      --bootstrap-cluster-pod-cidr string       Bootstrap cluster pod CIDR (default "192.168.122.0/24")
      --bootstrap-cluster-service-cidr string   Bootstrap cluster service CIDR (default "10.96.0.0/27")
  -c, --cluster cluster name                    Cluster name, cluster config is expected to be placed under <workspace dir>/<cluster name>/<cluster name>.yaml (default )
      --force                                   If true, ignore errors from preflight checks and validation except for GCP check errors.
  -h, --help                                    help for cluster
      --ignore-validation-errors                A validation error override, allowing to proceed despite the validation errors.
      --kubeconfig string                       The path to the kubeconfig file for the admin cluster
      --reuse-bootstrap-cluster                 If true, use existing bootstrap cluster.

Global Flags:
      --stderrthreshold severity            logs at or above this threshold go to stderr (default 3, possible values 0 to 3, corresponding to severity levels INFO, WARNING, ERROR, and FATAL)
  -v, --v Level                             number for the log level verbosity (default 1, possible values 0 to 5 (though there is no strict max), increase value to see more verbose logs)
      --workspace-dir workspace directory   bmctl workspace directory path (default "bmctl-workspace" under the current directory)

F1011 04:26:58.204666   14144 root.go:28] 
zijianjoy commented 2 years ago
 export PROJECT_ID=$(gcloud config get-value project)

gcloud iam service-accounts keys create bm-gcr.json \
 --iam-account=baremetal-gcr@${PROJECT_ID}.iam.gserviceaccount.com

mv ./bm-gcr.json /root/bm-gcr.json

 sudo bmctl create cluster -c anthos-gce-cluster
zijianjoy commented 2 years ago

IMPORTANT

Creating bare metal cluster will cause AI Platform Hosted Pipelines to fail, because it doesn't have keystore key external_cluster_credential_wrapping_key access. Be careful before creating bare metal cluster.