AI on GKE is a collection of examples, best-practices, and prebuilt solutions to help build, deploy, and scale AI Platforms on Google Kubernetes Engine
With IAP enabled for the RAG application, the rag-frontend workload runs out of memory after a couple of days and stops serving. I traced this to a memory leak when serving http requests. With IAP, the K8s ingress server pings the http server continuously and exposes the leak more rapidly.
With IAP enabled for the RAG application, the rag-frontend workload runs out of memory after a couple of days and stops serving. I traced this to a memory leak when serving http requests. With IAP, the K8s ingress server pings the http server continuously and exposes the leak more rapidly.
The culprit is at this line in frontend/container/cloud_sql.py.
This can be fixed by moving
connector = Connector()
into the if statement:global db
if db is None:
connector = Connector()
db = init_connection_pool(connector)