AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
Apache License 2.0
225
stars
172
forks
source link
RAG tf apply fail on AP cluster due to AP not scale up fast enough to deploy GMP #750
RAG terraform apply occasionally fail if when trying to deploy GMP but AP cluster has zero node at that time
Error: Internal error occurred: failed calling webhook "default.podmonitorings.gmp-operator.gke-gmp-system.monitoring.googleapis.com": failed to call webhook: Post "https://gmp-operator.gke-gmp-system.svc:443/default/monitoring.googleapis.com/v1/podmonitorings?timeout=10s": no endpoints available for service "gmp-operator"
with module.kuberay-monitoring.helm_release.gmp-engine,
on ../../modules/kuberay-monitoring/main.tf line 21, in resource "helm_release" "gmp-engine":
21: resource "helm_release" "gmp-engine" {
RAG terraform apply occasionally fail if when trying to deploy GMP but AP cluster has zero node at that time
failed cloud build log
The cluster will be fully working mins later but QSS deployment is already marked as failed.