Adds an initContainer definition that waits until a given Postgres server is ready to accept connections using pg_isready
Uses this initContainer in the Prefect migrations Job and the Prefect Server Deployment
This ensures that workloads depending on the database first check to see if the database is ready. This should help avoid crash loop backoffs and can, in certain cases, improve overall spin-up time.
This is mostly effective in scenarios where the database is not running yet (fresh instances or database upgrades), but still worth pursuing - especially because it's clear what a Pod's dependencies are and helps us avoid CrashLoopBackOff problems.
First, confirm the unit tests still pass. Additionally, you can manually check the logs for the new initContainer:
$ kubectl logs -f prefect-postgres-migration-3232e736-cmh79 -c wait-for-database
Waiting for PostgreSQL...
postgres:5432 - no response
Waiting for PostgreSQL...
postgres:5432 - no response
Waiting for PostgreSQL...
postgres:5432 - no response
Waiting for PostgreSQL...
postgres:5432 - accepting connections
I also ran a fairly unscientific test to compare how long the total time it took for the PrefectServer to become Ready:
#!/bin/bash
# usage:
# time ./test.sh
# Create instance
kubectl apply -f deploy/samples/v1_prefectserver_postgres.yaml
# Wait for the instance to be ready
kubectl wait --for=jsonpath='{.status.ready}'=true prefectserver/prefect-postgres
# Clean up
kubectl delete -f deploy/samples/v1_prefectserver_postgres.yaml
kubectl delete pvc postgres-database-postgres-0
Results:
from main branch: 31.268 total
from PR branch: 16.666 total
Pretty significant difference here - mostly because the Prefect Server and Migrations Pods aren't crash looping while the database comes up.
Summary
pg_isready
This ensures that workloads depending on the database first check to see if the database is ready. This should help avoid crash loop backoffs and can, in certain cases, improve overall spin-up time.
This is mostly effective in scenarios where the database is not running yet (fresh instances or database upgrades), but still worth pursuing - especially because it's clear what a Pod's dependencies are and helps us avoid CrashLoopBackOff problems.
Related to https://linear.app/prefect/issue/PLA-358/optimize-the-time-it-takes-for-the-prefect-operator-to-create-a-new
Testing
First, confirm the unit tests still pass. Additionally, you can manually check the logs for the new initContainer:
I also ran a fairly unscientific test to compare how long the total time it took for the PrefectServer to become Ready:
Results:
Pretty significant difference here - mostly because the Prefect Server and Migrations Pods aren't crash looping while the database comes up.