GoogleCloudPlatform / kubeflow-distribution

Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos
Apache License 2.0
80 stars 63 forks source link

Stuck in infinite loop during deploying Kubeflow cluster - kpt pkg get --auto-set=false #452

Open leetdavid opened 1 week ago

leetdavid commented 1 week ago

I am following the instructions on this page: https://googlecloudplatform.github.io/kubeflow-gke-docs/dev/docs/deploy/deploy-cli/#deploy-kubeflow

When I run the make apply, I get an infinite loop of

asmcli: Downloading ASM kpt package...
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------

...

Full logs:

make apply
cluster_name_regex=^[a-z][-a-z0-9]{0,22}[a-z0-9]$
The kubeflow cluster name "kubeflow" is valid.
PROJECT=kubeflow-project NAME=kubeflow ./hack/check_domain_length.sh
Build directory: ./build
Component path: common/managed-storage
Apply component resources: common/managed-storage
Found Makefile, call 'make apply' of this component Makefile.
rm -rf ./build
mkdir -p ./build
kustomize build -o ./build/ .
kubectl --context=kubeflow-mgmt apply -f ./build
sqlinstance.sql.cnrm.cloud.google.com/kubeflow-kfp unchanged
storagebucket.storage.cnrm.cloud.google.com/kubeflow-project-kfp unchanged
# Wait for all Google Cloud resources to get created and become ready.
# If this takes long, you can view status by:
cd common/managed-storage && make status
# For resources with READY=False, debug by:
kubectl --context=kubeflow-mgmt -n kubeflow-project describe <KIND>/<NAME>

kubectl --context=kubeflow-mgmt wait --for=condition=Ready --timeout=100s -f ./build \
        || kubectl --context=kubeflow-mgmt get -f ./build
sqlinstance.sql.cnrm.cloud.google.com/kubeflow-kfp condition met
storagebucket.storage.cnrm.cloud.google.com/kubeflow-project-kfp condition met
kubectl --context=kubeflow-mgmt wait --for=condition=Ready --timeout=500s -f ./build
sqlinstance.sql.cnrm.cloud.google.com/kubeflow-kfp condition met
storagebucket.storage.cnrm.cloud.google.com/kubeflow-project-kfp condition met
Build directory: ./build
Component path: common/cnrm
Apply component resources: common/cnrm
Found Makefile, call 'make apply' of this component Makefile.
echo ./build
./build
/Applications/Xcode.app/Contents/Developer/usr/bin/make apply-cnrm build_dir=./build
rm -rf ./build && mkdir -p ./build
kustomize build -o ./build ./
kubectl --context=kubeflow-mgmt apply -f ./build
computeaddress.compute.cnrm.cloud.google.com/kubeflow-ip unchanged
containercluster.container.cnrm.cloud.google.com/kubeflow unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-bigquery unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-cloudbuild unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-cloudsql unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-dataflow unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-dataproc unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-istio-wi unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-kubeflow-wi unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-logging unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-manages-user unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-metricwriter unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-ml unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-monitoringviewer unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-network unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-servicemanagement unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-source unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-storage unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-viewer unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-workload-identity-user unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-bigquery unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-cloudbuild unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-cloudsql unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-dataflow unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-dataproc unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-logging unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-metricwriter unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-ml unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-monitoringviewer unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-source unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-storage unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-viewer unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-workload-identity-user-ml-pipeline-ui unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-workload-identity-user-ml-pipeline-visualizationserver unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-workload-identity-user-pipeline-runner unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-logging unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-cloudtrace unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-meshtelemetry unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-monitoring-viewer unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-monitoring unchanged
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-storage unchanged
iamserviceaccount.iam.cnrm.cloud.google.com/kubeflow-admin unchanged
iamserviceaccount.iam.cnrm.cloud.google.com/kubeflow-user unchanged
iamserviceaccount.iam.cnrm.cloud.google.com/kubeflow-vm unchanged
service.serviceusage.cnrm.cloud.google.com/anthos.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/cloudbuild.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/cloudresourcemanager.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/compute.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/container.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/gkeconnect.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/gkehub.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/iamcredentials.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/iap.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/logging.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/meshca.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/meshconfig.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/meshtelemetry.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/monitoring.googleapis.com unchanged
service.serviceusage.cnrm.cloud.google.com/servicemanagement.googleapis.com unchanged
/Applications/Xcode.app/Contents/Developer/usr/bin/make wait-gcp
# Wait for all Google Cloud resources to get created and become ready.
Waiting for iamserviceaccount resources...
iamserviceaccount.iam.cnrm.cloud.google.com/kubeflow-admin condition met
iamserviceaccount.iam.cnrm.cloud.google.com/kubeflow-user condition met
iamserviceaccount.iam.cnrm.cloud.google.com/kubeflow-vm condition met
Waiting for iampolicymember resources...
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-bigquery condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-cloudbuild condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-cloudsql condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-dataflow condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-dataproc condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-istio-wi condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-kubeflow-wi condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-logging condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-manages-user condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-metricwriter condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-ml condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-monitoringviewer condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-network condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-servicemanagement condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-source condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-storage condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-viewer condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-admin-workload-identity-user condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-bigquery condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-cloudbuild condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-cloudsql condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-dataflow condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-dataproc condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-logging condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-metricwriter condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-ml condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-monitoringviewer condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-source condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-storage condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-viewer condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-workload-identity-user-ml-pipeline-ui condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-workload-identity-user-ml-pipeline-visualizationserver condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-user-workload-identity-user-pipeline-runner condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-logging condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-cloudtrace condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-meshtelemetry condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-monitoring condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-monitoring-viewer condition met
iampolicymember.iam.cnrm.cloud.google.com/kubeflow-vm-policy-storage condition met
Waiting for computeaddress resources...
computeaddress.compute.cnrm.cloud.google.com/kubeflow-ip condition met
Waiting for containercluster resources...
containercluster.container.cnrm.cloud.google.com/kubeflow condition met
/Applications/Xcode.app/Contents/Developer/usr/bin/make create-ctxt
PROJECT=kubeflow-project \
       REGION=asia-southeast1-a \
       NAME=kubeflow ../../hack/create_context.sh
+ kubectl config delete-context kubeflow
warning: this removed your active context, use "kubectl config use-context" to select a different one
deleted context kubeflow from /Users/david/.kube/config
+ set -ex
+ NAMESPACE=kubeflow
+ gcloud --project=kubeflow-project container clusters get-credentials --region=asia-southeast1-a kubeflow
Fetching cluster endpoint and auth data.
kubeconfig entry generated for kubeflow.
++ kubectl config current-context
+ kubectl config rename-context gke_kubeflow-project_asia-southeast1-a_kubeflow kubeflow
Context "gke_kubeflow-project_asia-southeast1-a_kubeflow" renamed to "kubeflow".
+ kubectl config set-context --current --namespace=kubeflow
Context "kubeflow" modified.
Build directory: ./build
Component path: asm
Apply component resources: asm
Found Makefile, call 'make apply' of this component Makefile.
curl https://storage.googleapis.com/csm-artifacts/asm/asmcli_1.16.2-asm.2-config1 > asmcli;
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  195k  100  195k    0     0   231k      0 --:--:-- --:--:-- --:--:--  231k
chmod +x asmcli
rm -rf asm.tar.gz
curl -LJ https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages/archive/refs/tags/1.16.2-asm.2+config1.tar.gz -o asm.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  239k    0  239k    0     0   237k      0 --:--:--  0:00:01 --:--:-- 1508k
rm -rf ./package
mkdir ./package
tar -xf asm.tar.gz --strip-components=1 -C ./package
./asmcli install \
    --project_id kubeflow-project \
    --cluster_name kubeflow \
    --cluster_location asia-southeast1-a \
    --output_dir ./package \
    --enable_all \
    --ca mesh_ca \
    --custom_overlay ./package/asm/istio/options/iap-operator.yaml \
    --custom_overlay ./options/ingressgateway-iap.yaml \
    --option legacy-default-ingressgateway \
    --verbose
asmcli: Setting up necessary files...
asmcli: Using /Users/david/developer/kubeflow-distribution/kubeflow/asm/package/asm_kubeconfig as the kubeconfig...
asmcli: Checking installation tool dependencies...
asmcli: [WARNING]: Installation is only supported on x86_64.
asmcli: Fetching/writing GCP credentials to kubeconfig file...
asmcli: Running: '/opt/homebrew/share/google-cloud-sdk/bin/gcloud container clusters get-credentials kubeflow --project=kubeflow-project --zone=asia-southeast1-a'
asmcli: -------------
Fetching cluster endpoint and auth data.
kubeconfig entry generated for kubeflow.
asmcli: Running: '/opt/homebrew/share/google-cloud-sdk/bin/kubectl --kubeconfig /Users/david/developer/kubeflow-distribution/kubeflow/asm/package/asm_kubeconfig config current-context'
asmcli: -------------
asmcli: Verifying connectivity (10s)...
asmcli: Running: '/opt/homebrew/share/google-cloud-sdk/bin/kubectl --kubeconfig /Users/david/developer/kubeflow-distribution/kubeflow/asm/package/asm_kubeconfig --context gke_kubeflow-project_asia-southeast1-a_kubeflow config view --minify=true -ojson'
asmcli: -------------
asmcli: Running: 'nc -zvw 10 34.142.144.110 443'
asmcli: -------------
Connection to 34.142.144.110 port 443 [tcp/https] succeeded!
asmcli: kubeconfig set to /Users/david/developer/kubeflow-distribution/kubeflow/asm/package/asm_kubeconfig
asmcli: using context gke_kubeflow-project_asia-southeast1-a_kubeflow
asmcli: Getting account information...
asmcli: Running: '/opt/homebrew/share/google-cloud-sdk/bin/gcloud auth list --project=kubeflow-project --filter=status:ACTIVE --format=value(account)'
asmcli: -------------
asmcli: Running: '/opt/homebrew/share/google-cloud-sdk/bin/gcloud config get-value auth/impersonate_service_account'
asmcli: -------------
(unset)
asmcli: Running: '/opt/homebrew/share/google-cloud-sdk/bin/gcloud container clusters list --project=kubeflow-project --filter=name = kubeflow AND location = asia-southeast1-a --format=value(name)'
asmcli: -------------
WARNING: --filter : operator evaluation is changing for consistency across Google APIs.  name=kubeflow currently does not match but will match in the near future.  Run `gcloud topic filters` for details.
asmcli: Running: '/opt/homebrew/share/google-cloud-sdk/bin/kpt version'
asmcli: -------------
asmcli: Downloading kpt..
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 11.9M  100 11.9M    0     0  7227k      0  0:00:01  0:00:01 --:--:-- 19.7M
asmcli: Downloading ASM..
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 26.3M  100 26.3M    0     0  15.0M      0  0:00:01  0:00:01 --:--:-- 15.0M
asmcli: Downloading ASM kpt package...
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------
asmcli: Running: 'kpt pkg get --auto-set=false https://github.com/GoogleCloudPlatform/anthos-service-mesh-packages.git/asm@1.16.2-asm.2+config1 asm'
asmcli: -------------

I think my issue is similar to #443.

What other information would be helpful in debugging this issue?

leetdavid commented 1 week ago

I got it working:

Running the code in macOS was causing the infinite loop; I ran the same process in my Debian 12 instance and it seems to have worked.