Closed rb-leadr closed 1 year ago
You can use controller-managed configuration (configuration built from Ingress resources) with DB-less. It's the default configuration even. Specifying a kong.yml directly is an alternative.
That said, Kong should not go down when Postgres is unavailable. You cannot start new Kong instances while Postgres is offline, but existing instances should continue serving traffic on a best-effort basis out of their configuration cache and reconnect to Postgres when it returns.
You can specify a read-only backup Postgres instance via the various pg_ro_
settings. Setting up a read replica should give you some more flexibility when rolling out an updated Postgres deployment.
If you're seeing that existing instances terminate when Postgres is unavailable, that's abnormal, and you should file an issue with the gateway team with replication steps and logs.
Will go ahead and close this as I think all questions from the OP are answered, but if you have additional follow-up questions respond back and we can reopen it.
Hi @rainest - thanks for jumping in to help!
I've been off working on some other more pressing issues, and am now returning to this.
One factor that might play in: We use FluxCD, which automatically updates our k8s ingress resources as they change in our main branch in source control. I wonder if this is why the Kong pods in Test B start crashlooping.
I did a bit of testing to get more data - here's the output of that:
Test A (control - delete a non-pg node) Step 1: Determine node kong postgres and other pods are running on, and terminate a node other than postgres from AWS console (not kubectl) Step 2: run kubectl get pods -n kong -w Step 3: run kubectl get nodes -w Step 4: periodically run curl command to healthcheck behind kong
Test A Results
Test B (terminate node from AWS) Step 1: Determine node kong postgres and other pods are running on, and terminate a node other than postgres from AWS console (not kubectl) Step 2: run kubectl get pods -n kong -w Step 3: run kubectl get nodes -w Step 4: periodically run curl command to healthcheck behind kong
Test B Results
Test C (terminate node from kubectl) Step 1: Determine node kong postgres is running on, and run kubectl delete node Step 2: run kubectl get pods -n kong -w Step 3: run kubectl get nodes -w Step 4: periodically run curl command to healthcheck behind kong
Test C Results
Test D (terminate pg pod) Step 1: run kubectl delete pod kong-postgresql-0 Step 2: run kubectl get pods -n kong -w Step 3: run kubectl get nodes -w Step 4: periodically run curl command to healthcheck behind kong
Test D Results postgres shut down gracefully, and restarts fairly quickly, resulting in no disruption Overall test result: PASS
Overall, the only failure scenario that was concerning is if one of our EKS nodes fails in EC2, where the control plane is unaware that it happened, and thus tries to keep reconnecting before finally giving up - it's in this limbo time that the pods are crashlooping.
This would seem to be an exceedingly rare case, but something I am slightly concerned about due to the severity of the outcome when it does happen.
I think I do need to make a couple of adjustments to our config just as a better practice: 1) Consider using only 2 AZs instead of 3 to decrease the likelihood of not being able to spin up postgres on a node in the same AZ as the persistentvolume 2) Consider increasing capacity slightly by adding a node to the zone where the PV is located 3) Split up proxy and ingress controller containers into separate pods. (I'm not sure this would really help the specific issue at hand, but having them bundled as currently configured is a k8s antipattern. 4) Consider moving to dbless configuration 5) Consider switching to use a different postgres DB cluster controlled outside of kubernetes, since we already have one in RDS that is used by our app. 6) Consider adding a second read replica to postgres that would reside on a different node in the same AZ - not sure how to configure that. If you have any other suggestions given the symptoms and the below config, please let me know! I'd love to hear them. Thanks in advance!
Ryan
@rainest Here's our helmrelease definition:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: kong
namespace: kong
spec:
chart:
spec:
version: "2.15.3"
chart: kong
sourceRef:
kind: HelmRepository
name: kong
namespace: kong
interval: 1h0m0s
install:
remediation:
retries: 3
upgrade:
remediation:
retries: 3
strategy: "rollback"
values:
# Kong for Kubernetes with Kong Enterprise with Enterprise features enabled and
# exposed via TLS-enabled Ingresses. Before installing:
# * Several settings (search for the string "CHANGEME") require user-provided
# Secrets. These Secrets must be created before installation.
# * Ingresses reference example "<service>.kong.CHANGEME.example" hostnames. These must
# be changed to an actual hostname that resolve to your proxy.
# * Ensure that your session configurations create cookies that are usable
# across your services. The admin session configuration must create cookies
# that are sent to both the admin API and Kong Manager, and any Dev Portal
# instances with authentication must create cookies that are sent to both
# the Portal and Portal API.
fullnameOverride: kong
admin:
annotations:
konghq.com/protocol: https
enabled: true
http:
enabled: false
ingress:
annotations:
konghq.com/https-redirect-status-code: "301"
konghq.com/protocols: https
konghq.com/strip-path: "true"
kubernetes.io/ingress.class: kong
nginx.ingress.kubernetes.io/app-root: /
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
nginx.ingress.kubernetes.io/permanent-redirect-code: "301"
enabled: true
hostname: kong.gateway.env.company.service
path: /api
tls: kong-admin-cert
tls:
containerPort: 8444
enabled: true
parameters:
- http2
servicePort: 8444
type: ClusterIP
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/instance
operator: In
values:
- dataplane
topologyKey: kubernetes.io/hostname
weight: 100
certificates:
enabled: true
issuer: kong-selfsigned-issuer
cluster:
enabled: true
admin:
enabled: true
commonName: kong.gateway.env.company.service
portal:
enabled: true
commonName: developer.gateway.env.company.service
proxy:
enabled: true
commonName: gateway.env.company.dev
dnsNames:
- '*.gateway.env.company.dev'
- 'env.company.dev'
- '*.env.company.dev'
cluster:
enabled: true
labels:
konghq.com/service: cluster
tls:
containerPort: 8005
enabled: true
servicePort: 8005
type: ClusterIP
clustertelemetry:
enabled: true
tls:
containerPort: 8006
enabled: true
servicePort: 8006
type: ClusterIP
deployment:
kong:
daemonset: false
enabled: true
enterprise:
enabled: true
license_secret: kong-enterprise-license
portal:
enabled: true
rbac:
admin_api_auth: basic-auth
admin_gui_auth_conf_secret: kong-config-secret
enabled: true
session_conf_secret: kong-config-secret
smtp:
enabled: false
vitals:
enabled: true
env:
admin_access_log: /dev/stdout
admin_api_uri: https://kong.gateway.env.company.service/api
admin_error_log: /dev/stdout
admin_gui_access_log: /dev/stdout
admin_gui_error_log: /dev/stdout
admin_gui_host: kong.gateway.env.company.service
admin_gui_protocol: https
admin_gui_url: https://kong.gateway.env.company.service/
cluster_data_plane_purge_delay: 60
cluster_listen: 0.0.0.0:8005
cluster_telemetry_listen: 0.0.0.0:8006
database: postgres
log_level: debug
lua_package_path: /opt/?.lua;;
nginx_worker_processes: "2"
password:
valueFrom:
secretKeyRef:
key: kong_admin_password
name: kong-config-secret
pg_database: kong
pg_host:
valueFrom:
secretKeyRef:
key: pg_host
name: kong-config-secret
pg_ssl: "off"
pg_ssl_verify: "off"
pg_user: kong
plugins: bundled,openid-connect
portal: true
portal_api_access_log: /dev/stdout
portal_api_error_log: /dev/stdout
portal_api_url: https://developer.gateway.env.company.service/api
portal_auth: basic-auth
portal_cors_origins: '*'
portal_gui_access_log: /dev/stdout
portal_gui_error_log: /dev/stdout
portal_gui_host: developer.gateway.env.company.service
portal_gui_protocol: https
portal_gui_url: https://developer.gateway.env.company.service/
portal_session_conf:
valueFrom:
secretKeyRef:
key: portal_session_conf
name: kong-config-secret
prefix: /kong_prefix/
proxy_access_log: /dev/stdout
proxy_error_log: /dev/stdout
proxy_stream_access_log: /dev/stdout
proxy_stream_error_log: /dev/stdout
smtp_mock: "on"
status_listen: 0.0.0.0:8100
trusted_ips: 0.0.0.0/0,::/0
vitals: true
extraLabels:
konghq.com/component: kong
image:
repository: kong/kong-gateway
tag: "3.0"
ingressController:
enabled: true
env:
kong_admin_filter_tag: ingress_controller_kong
kong_admin_tls_skip_verify: true
kong_admin_token:
valueFrom:
secretKeyRef:
key: password
name: kong-config-secret
kong_admin_url: https://localhost:8444
kong_workspace: default
publish_service: kong/kong-proxy
image:
repository: docker.io/kong/kubernetes-ingress-controller
tag: "2.7"
ingressClass: kong
installCRDs: false
manager:
annotations:
konghq.com/protocol: https
enabled: true
http:
containerPort: 8002
enabled: false
servicePort: 8002
ingress:
annotations:
konghq.com/https-redirect-status-code: "301"
kubernetes.io/ingress.class: kong
nginx.ingress.kubernetes.io/backend-protocol: HTTPS
enabled: true
hostname: kong.gateway.env.company.service
path: /
tls: kong-admin-cert
tls:
containerPort: 8445
enabled: true
parameters:
- http2
servicePort: 8445
type: ClusterIP
migrations:
enabled: true
postUpgrade: true
preUpgrade: true
namespace: kong
podAnnotations:
kuma.io/gateway: enabled
prometheus.io/port: "8100"
prometheus.io/scrape: "true"
portal:
annotations:
konghq.com/protocol: https
enabled: true
http:
containerPort: 8003
enabled: false
servicePort: 8003
ingress:
annotations:
konghq.com/https-redirect-status-code: "301"
konghq.com/protocols: https
konghq.com/strip-path: "false"
kubernetes.io/ingress.class: kong
enabled: true
hostname: developer.gateway.env.company.service
path: /
tls: kong-portal-cert
tls:
containerPort: 8446
enabled: true
parameters:
- http2
servicePort: 8446
type: ClusterIP
portalapi:
annotations:
konghq.com/protocol: https
enabled: true
http:
enabled: false
ingress:
annotations:
konghq.com/https-redirect-status-code: "301"
konghq.com/protocols: https
konghq.com/strip-path: "true"
kubernetes.io/ingress.class: kong
nginx.ingress.kubernetes.io/app-root: /
enabled: true
hostname: developer.gateway.env.company.service
path: /api
tls: kong-portal-cert
tls:
containerPort: 8447
enabled: true
parameters:
- http2
servicePort: 8447
type: ClusterIP
postgresql:
enabled: true
auth:
database: kong
username: kong
proxy:
annotations:
prometheus.io/port: "8100"
prometheus.io/scrape: "true"
service.beta.kubernetes.io/aws-load-balancer-backend-protocol: "https"
service.beta.kubernetes.io/aws-load-balancer-ssl-cert: "arn:aws:acm:us-west-2:<awsAccount>:certificate/<cert-guid>"
service.beta.kubernetes.io/aws-load-balancer-ssl-ports: "443"
service.beta.kubernetes.io/aws-load-balancer-type: "alb"
enabled: true
http:
containerPort: 8080
enabled: false
ingress:
enabled: false
labels:
enable-metrics: true
tls:
containerPort: 8080
enabled: true
type: LoadBalancer
serviceMonitor:
enabled: true
additionalLabels:
app.kubernetes.io/part-of: kube-prometheus-stack
interval: 10s
namespace: monitoring
replicaCount: 3
secretVolumes: []
status:
enabled: true
http:
containerPort: 8100
enabled: true
tls:
containerPort: 8543
enabled: false
updateStrategy:
rollingUpdate:
maxSurge: 100%
maxUnavailable: 100%
type: RollingUpdate
Hello - We're using kong with annotated kubernetes ingress and service resources driving the config.
The problem we're facing is when we do a node roll, kong goes down while the postgresql pod is unavailable, taking our APIs down.
Having 3 kong pods for HA and still having a single point of failure seems like an anti-pattern.
I looked into dbless, but it looks like to do that you need to use a kong.yml config, while can't be configured in independent namespaces the way ingresses and services can.
Is there a way to either use ingress/service resources without postgres, or to make the postgres truly HA?