GoogleCloudPlatform / kubeflow-distribution

Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos
Apache License 2.0
77 stars 63 forks source link

timed out waiting for the condition on sqlinstances/kubeflow-kfp in Kubeflow Deployment #447

Open Ajakhongir opened 9 months ago

Ajakhongir commented 9 months ago

I have deployed management cluster successfully. In the Kubeflow Deployment I am facing with following error by running make apply command:

`cluster_name_regex=^[a-z][-a-z0-9]{0,22}[a-z0-9]$ The kubeflow cluster name "kubeflow" is valid. PROJECT=my-project NAME=kubeflow ./hack/check_domain_length.sh Build directory: ./build Component path: common/managed-storage Apply component resources: common/managed-storage Found Makefile, call 'make apply' of this component Makefile. make[1]: Entering directory '/home/jakhongirn1/kubeflow-distribution/kubeflow/common/managed-storage' rm -rf ./build mkdir -p ./build kustomize build -o ./build/ . Warning: 'commonLabels' is deprecated. Please use 'labels' instead. Run 'kustomize edit fix' to update your Kustomization automatically. kubectl --context=first-management-cluster apply -f ./build sqlinstance.sql.cnrm.cloud.google.com/kubeflow-kfp unchanged storagebucket.storage.cnrm.cloud.google.com/my-project-kfp unchanged Wait for all Google Cloud resources to get created and become ready. If this takes long, you can view status by: cd common/managed-storage && make status For resources with READY=False, debug by: kubectl --context=first-management-cluster -n my-project describe /

kubectl --context=first-management-cluster wait --for=condition=Ready --timeout=100s -f ./build \ || kubectl --context=first-management-cluster get -f ./build timed out waiting for the condition on sqlinstances/kubeflow-kfp timed out waiting for the condition on storagebuckets/my-project-kfp NAME AGE READY STATUS STATUS AGE sqlinstance.sql.cnrm.cloud.google.com/kubeflow-kfp 144m False UpdateFailed 144m

NAME AGE READY STATUS STATUS AGE storagebucket.storage.cnrm.cloud.google.com/my-project-kfp 144m False UpdateFailed 144m kubectl --context=first-management-cluster wait --for=condition=Ready --timeout=500s -f ./build timed out waiting for the condition on sqlinstances/kubeflow-kfp timed out waiting for the condition on storagebuckets/my-project-kfp make[1]: [Makefile:48: wait] Error 1 make[1]: Leaving directory '/home/jakhongirn1/kubeflow-distribution/kubeflow/common/managed-storage' make: [Makefile:83: apply] Error 1`

I have tried to debug the sql instance resource by running this command:

kubectl --context=first-management-cluster -n my-project describe sqlinstance.sql.cnrm.cloud.google.com/kubeflow-kfp

and I had this output: `Name: kubeflow-kfp Namespace: my-project Labels: app=managed-storage kf-name=kubeflow Annotations: cnrm.cloud.google.com/management-conflict-prevention-policy: none cnrm.cloud.google.com/project-id: my-project cnrm.cloud.google.com/state-into-spec: merge API Version: sql.cnrm.cloud.google.com/v1beta1 Kind: SQLInstance Metadata: Creation Timestamp: 2023-12-09T20:32:42Z Generation: 1 Resource Version: 47133 UID: d160ac6d-f2f0-40b6-ae94-04f6d47c0882 Spec: Database Version: MYSQL_8_0 Region: us-central1 Settings: Availability Type: ZONAL Location Preference: Zone: us-central1-c Tier: db-custom-1-3840 Status: Conditions: Last Transition Time: 2023-12-09T20:32:43Z Message: Update call failed: error fetching live state: error reading underlying resource: summary: Error when reading or editing SQL Database Instance "kubeflow-kfp": Get "https://sqladmin.googleapis.com/sql/v1beta4/projects/my-project/instances/kubeflow-kfp?alt=json&prettyPrint=false": metadata: GCE metadata "instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.readonly" not defined Reason: UpdateFailed Status: False Type: Ready Observed Generation: 1 Events: Type Reason Age From Message


Warning UpdateFailed 4m40s (x76 over 144m) sqlinstance-controller Update call failed: error fetching live state: error reading underlying resource: summary: Error when reading or editing SQL Database Instance "kubeflow-kfp": Get "https://sqladmin.googleapis.com/sql/v1beta4/projects/my-project/instances/kubeflow-kfp?alt=json&prettyPrint=false": metadata: GCE metadata "instance/service-accounts/default/token?scopes=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email%2Chttps%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.readonly" not defined`

I don't know what to do next. Please help me out with this problem.

Thanks

chensun commented 7 months ago

@Ajakhongir have you tried contact GCP support on config controller? https://cloud.google.com/anthos-config-management/docs/concepts/config-controller-overview