GoogleCloudPlatform / kubeflow-distribution

Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos
Apache License 2.0
80 stars 63 forks source link

Error during kubeflow deployment on GCP #446

Open munish07 opened 11 months ago

munish07 commented 11 months ago

I am trying to setup kubeflow on GCP. I have the management cluster up and running.

When I run make apply I see the below error.

xxxxxx@cloudshell:~/kubeflow-distribution/kubeflow (xxxxxx-cw-internal)$ make apply
cluster_name_regex=^[a-z][-a-z0-9]{0,22}[a-z0-9]$
The kubeflow cluster name "kubeflow" is valid.
PROJECT=xxxxxx-cw-internal NAME=kubeflow ./hack/check_domain_length.sh
Build directory: ./build
Component path: common/managed-storage
Apply component resources: common/managed-storage
Found Makefile, call 'make apply' of this component Makefile. 
make[1]: Entering directory '/home/xxxxxx/kubeflow-distribution/kubeflow/common/managed-storage'
rm -rf ./build
mkdir -p ./build
kustomize build -o ./build/ .
kubectl --context=kf-mgmt apply -f ./build
sqlinstance.sql.cnrm.cloud.google.com/kubeflow-kfp created
storagebucket.storage.cnrm.cloud.google.com/xxxxxx-cw-internal-kfp created
# Wait for all Google Cloud resources to get created and become ready.
# If this takes long, you can view status by:
cd common/managed-storage && make status
# For resources with READY=False, debug by:
kubectl --context=kf-mgmt -n xxxxxx-cw-internal describe <KIND>/<NAME>

kubectl --context=kf-mgmt wait --for=condition=Ready --timeout=100s -f ./build \
        || kubectl --context=kf-mgmt get -f ./build
timed out waiting for the condition on sqlinstances/kubeflow-kfp
timed out waiting for the condition on storagebuckets/xxxxxx-cw-internal-kfp
NAME                                                 AGE    READY   STATUS         STATUS AGE
sqlinstance.sql.cnrm.cloud.google.com/kubeflow-kfp   104s   False   UpdateFailed   104s

NAME                                                                 AGE    READY   STATUS         STATUS AGE
storagebucket.storage.cnrm.cloud.google.com/xxxxxx-cw-internal-kfp   104s   False   UpdateFailed   104s
kubectl --context=kf-mgmt wait --for=condition=Ready --timeout=500s -f ./build
timed out waiting for the condition on sqlinstances/kubeflow-kfp
timed out waiting for the condition on storagebuckets/xxxxxx-cw-internal-kfp
make[1]: *** [Makefile:48: wait] Error 1
make[1]: Leaving directory '/home/xxxxxx/kubeflow-distribution/kubeflow/common/managed-storage'
make: *** [Makefile:83: apply] Error 1

It is not able to create the SQL database or the GCD bucket.

Ajakhongir commented 11 months ago

The same error with me.

saisona commented 3 weeks ago

What is the Kubeflow version you try to deploy (manifest version i mean) ?

Also did you launched the make apply-kcc in the kubeflow cluster and before that make grant-owner-permission in the management cluster as it seems your issue is related to permissions :D

Try to run in the management cluster kubectl describe storagebucket