GoogleCloudPlatform / kubeflow-distribution

Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos
Apache License 2.0
80 stars 63 forks source link

make hydrate fails with "no cluster named ..." #421

Closed edi-bice closed 1 year ago

edi-bice commented 1 year ago

ubuntu@primary:~/gcp-blueprints_kubeflow-saas-ml/kubeflow$ make hydrate cluster_name_regex=^[a-z][-a-z0-9]{0,22}[a-z0-9]$ The kubeflow cluster name "kf17" is valid. PROJECT=saas-ml-dev NAME=kf17 ./hack/check_domain_length.sh Build directory: ./build Component path: common/managed-storage Apply component resources: common/managed-storage Found Makefile, call 'make apply' of this component Makefile. make[1]: Entering directory '/home/ubuntu/gcp-blueprints_kubeflow-saas-ml/kubeflow/common/managed-storage' rm -rf ./build mkdir -p ./build kustomize build -o ./build/ . make[1]: Leaving directory '/home/ubuntu/gcp-blueprints_kubeflow-saas-ml/kubeflow/common/managed-storage' Build directory: ./build Component path: common/cnrm Apply component resources: common/cnrm Found Makefile, call 'make apply' of this component Makefile. make[1]: Entering directory '/home/ubuntu/gcp-blueprints_kubeflow-saas-ml/kubeflow/common/cnrm' rm -rf ./build && mkdir -p ./build kustomize build -o ./build ./ make[1]: Leaving directory '/home/ubuntu/gcp-blueprints_kubeflow-saas-ml/kubeflow/common/cnrm' Build directory: ./build Component path: asm Apply component resources: asm Found Makefile, call 'make apply' of this component Makefile. make[1]: Entering directory '/home/ubuntu/gcp-blueprints_kubeflow-saas-ml/kubeflow/asm' curl https://storage.googleapis.com/csm-artifacts/asm/asmcli_1.16.2-asm.2-config1 > asmcli; % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 195k 100 195k 0 0 37757 0 0:00:05 0:00:05 --:--:-- 40760 chmod +x asmcli rm -rf asm.tar.gz curl -LJ https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages/archive/refs/tags/1.16.2-asm.2+config1.tar.gz -o asm.tar.gz % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- 0:00:05 --:--:-- 0 100 239k 0 239k 0 0 23236 0 --:--:-- 0:00:10 --:--:-- 62686 rm -rf ./package mkdir ./package tar -xf asm.tar.gz --strip-components=1 -C ./package ./asmcli validate \ --project_id saas-ml-dev \ --cluster_name kf17 \ --cluster_location us-central1-a \ --output_dir ./package \ --ca mesh_ca \ --custom_overlay ./package/asm/istio/options/iap-operator.yaml \ --custom_overlay ./options/ingressgateway-iap.yaml \ --option legacy-default-ingressgateway \ --verbose asmcli: Setting up necessary files... asmcli: Using /home/ubuntu/gcp-blueprints_kubeflow-saas-ml/kubeflow/asm/package/asm_kubeconfig as the kubeconfig... asmcli: Checking installation tool dependencies... asmcli: Fetching/writing GCP credentials to kubeconfig file... asmcli: Running: '/usr/bin/gcloud container clusters get-credentials kf17 --project=saas-ml-dev --zone=us-central1-a' asmcli: ------------- Fetching cluster endpoint and auth data. ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=404, message=Not found: projects/saas-ml-dev/zones/us-central1-a/clusters/kf17. No cluster named 'kf17' in saas-ml-dev. asmcli: [WARNING]: Failed, retrying...(1 of 2) asmcli: Running: '/usr/bin/gcloud container clusters get-credentials kf17 --project=saas-ml-dev --zone=us-central1-a' asmcli: ------------- Fetching cluster endpoint and auth data. ERROR: (gcloud.container.clusters.get-credentials) ResponseError: code=404, message=Not found: projects/saas-ml-dev/zones/us-central1-a/clusters/kf17. No cluster named 'kf17' in saas-ml-dev. asmcli: [WARNING]: Failed, retrying...(2 of 2) asmcli: [WARNING]: Command 'gcloud container clusters get-credentials kf17 --project=saas-ml-dev --zone=us-central1-a' failed. make[1]: [Makefile:62: asmcli-validate] Error 1 make[1]: Leaving directory '/home/ubuntu/gcp-blueprints_kubeflow-saas-ml/kubeflow/asm' make: [Makefile:77: hydrate] Error 1

Linchin commented 1 year ago

Hi @edi-bice, thank you for reporting the issue! It looks like this is caused by asmcli being unable to find the cluster named k17. Could you please verify that the cluster exists, and the machine running the command has the required authorization to access k17? Thank you for your help!

Linchin commented 1 year ago

I was not able to duplicate the issue, make hydrate runs without issue on my instance. @edi-bice, suggest you verify that:

  1. make sure the cluster with the same name exists
  2. the location/region/zone of the kubeflow cluster exactly matches the cluster

I will close this issue. Please reopen if you have any other question.