GoogleCloudPlatform / kubeflow-distribution

Blueprints for Deploying Kubeflow on Google Cloud Platform and Anthos
Apache License 2.0
80 stars 63 forks source link

Race condition in KServe CRD #384

Closed gkcalat closed 2 years ago

gkcalat commented 2 years ago

Deployment of Kubeflow fails after running make apply from kubeflow folder with the error message:

...
serviceaccount/kserve-models-web-app created
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-lgbserver.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-mlserver.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-paddleserver.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-pmmlserver.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-sklearnserver.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-tensorflow-serving.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-torchserve.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-tritonserver.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
unable to recognize "build/serving.kserve.io_v1alpha1_clusterservingruntime_kserve-xgbserver.yaml": no matches for kind "ClusterServingRuntime" in version "serving.kserve.io/v1alpha1"
make[1]: *** [Makefile:13: apply] Error 1
make[1]: Leaving directory '.../kubeflow/contrib/kserve'
make: *** [Makefile:83: apply] Error 1

However, after rerunning make apply the problem resolves.

Should we consider moving KServe CRDs and applying them separately before deploying the runtime resources to have a gap in time between.

gkcalat commented 2 years ago

Fixed in v1.6.0