canonical / traefik-k8s-operator

This charmed operator automates the operational procedures of running Traefik, an open-source application proxy.
https://charmhub.io/traefik-k8s
Apache License 2.0
11 stars 21 forks source link

Traefik forward traffic to backends which are down #262

Open gnuoy opened 9 months ago

gnuoy commented 9 months ago

Bug Description

Traefik does not seem to do an aliveness check on the backends it is forwarding traffic to. This causes client requests to fail if a backend is down.

Perhaps charms.traefik_k8s.v2.ingress should support the requirer passing a health check url ?

To Reproduce

1) Deploy this bundle: https://opendev.org/openstack/charm-keystone-k8s/src/branch/main/tests/bundles/smoke.yaml 2) Add a keystone unit: juju add-unit keystone 3) Wait for unit to be ready 4) URL=$(juju run keystone/leader get-admin-account | awk 'BEGIN {FS="="} /OS_AUTH_URL/ {print $NF}') 5) curl $URL (repeat multiple times to check both backends are alive as traefik will round-robin the backends) 6) juju ssh --container keystone keystone/1 "pebble stop wsgi-keystone" 7) Repeat step 5 and every other request will be a bad gateway

Example output: https://paste.ubuntu.com/p/zHrmjFrQ7g/

Environment

juju 3.2.3-genericlinux-amd64 Controller in microk8s Traefick charm: 1.0/candidate r148

Relevant log output

2023-10-04T09:04:43.426Z [traefik] time="2023-10-04T09:04:43Z" level=debug msg="'502 Bad Gateway' caused by: dial tcp 10.1.188.231:5000: connect: connection refused"
2023-10-04T09:04:44.446Z [traefik] time="2023-10-04T09:04:44Z" level=debug msg="'502 Bad Gateway' caused by: dial tcp 10.1.188.231:5000: connect: connection refused"

Additional context

No response

PietroPasotti commented 9 months ago

could definitely wrap around https://doc.traefik.io/traefik/routing/services/#health-check we'll discuss prioritization in the next backlog refinement

sed-i commented 8 months ago

Path to a health check seems like a reasonable addition to the reldata schema.