GoogleCloudPlatform / ai-on-gke

AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
Apache License 2.0
194 stars 143 forks source link

RAG Application - release-1.1 - Failing running Terraform #649

Open vmasilva opened 2 months ago

vmasilva commented 2 months ago

Terraform apply fails. When runnint terraform apply, it fails deploying Kubernetes.

Used Branch: release-1.1

Logs:

module.inference-server.kubernetes_deployment.inference_deployment: Still creating... [29m51s elapsed]
╷
│ Warning: "default_secret_name" is no longer applicable for Kubernetes v1.24.0 and above
│ 
│   with module.frontend.module.frontend-workload-identity.kubernetes_service_account.main[0],
│   on .terraform/modules/frontend.frontend-workload-identity/modules/workload-identity/main.tf line 51, in resource "kubernetes_service_account" "main":
│   51: resource "kubernetes_service_account" "main" {
│ 
│ Starting from version 1.24.0 Kubernetes does not automatically generate a token for service accounts, in this case, "default_secret_name" will be empty
│ 
│ (and 2 more similar warnings elsewhere)
╵
╷
│ Error: Waiting for rollout to finish: 3 replicas wanted; 2 replicas Ready
│ 
│   with module.frontend.kubernetes_deployment.rag_frontend_deployment,
│   on frontend/main.tf line 85, in resource "kubernetes_deployment" "rag_frontend_deployment":
│   85: resource "kubernetes_deployment" "rag_frontend_deployment" {
│ 
╵
╷
│ Error: Waiting for rollout to finish: 1 replicas wanted; 0 replicas Ready
│ 
│   with module.inference-server.kubernetes_deployment.inference_deployment,
│   on ../../tutorials-and-examples/hf-tgi/main.tf line 49, in resource "kubernetes_deployment" "inference_deployment":
│   49: resource "kubernetes_deployment" "inference_deployment" {`

Note: I deployed with default configuration., The only change was that I am deploying without GPU

file: main.tf (line: 79)
  enable_gpu         = false

Can someone help me troobleshotting?

jsoohoo-google commented 2 months ago

@vmasilva can you please share the TF cluster configuration you are using?

Note we recommend to run with GPUs for performance.

(We have also done a few fixes in 1.1.2, but I don't believe they would have directly addressed your issue.)