canonical / traefik-k8s-operator

This charmed operator automates the operational procedures of running Traefik, an open-source application proxy.
https://charmhub.io/traefik-k8s
Apache License 2.0
11 stars 25 forks source link

Traefik blocks when get svc only returns hostname, not ip address #378

Closed asbalderson closed 2 months ago

asbalderson commented 2 months ago

Bug Description

in commit 354 the way the hostname/address for a load-balancer was refactored. In the initial code, we could handle cases with both hostname, or IP address being returned. While it is true that juju may manage the hostname for the service, in some cases, like AWS, kubectl get svc will only return a hostname for the service instead of IP address, resulting in traefik being blocked stating that "gateway address unavailable"

I think returning the IP adresss if it exists first, and then if it doesn't returning the hostname makes sense for the cases where the Loadbalancer assigns different context.

In the attached log output you can see the difference between the svc output for AWS vs metallb. SQA could also return what Octavia returns from this output as well when running traefik on o7k with a load balancer.

To Reproduce

  1. deploy charmed k8s on aws following these instructions: https://ubuntu.com/kubernetes/docs/aws-integration
  2. juju add-k8s the charmed k8s into the aws controller
  3. juju deploy cos-lite
  4. wait for deployment to settle, and traefik to be blocked with "gateway address unavailable"
  5. kubectl --kubeconfig kube.conf get svc -n cos traefik --output json to observe the lack of IP address in the load balancer status.

Environment

In the testing environment(s) SQA was running charmed k8s 1.28/stable on aws with the aws-integrator charm, and on baremetal was running microk8s 1.28/stable with metallb. we have seen this issue in all stable versions of juju 3.x and candidate versions of juju 3.5. In all cases we were using latest/stable for traefik.

Relevant log output

On microk8s with metallb we can see the svc context like so:

$ kubectl --kubeconfig kube.conf get svc -n cos traefik --output json
{
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "annotations": {
            "controller.juju.is/id": "f1160f08-aaf7-41d0-8b49-0fa01b7f699b",
            "juju.is/version": "3.3.5",
            "metallb.universe.tf/ip-allocated-from-pool": "juju-system-microk8s-metallb",
            "model.juju.is/id": "8a98550d-b705-44b6-8ca4-06d7cc1181c9"
        },
        "creationTimestamp": "2024-06-27T17:18:03Z",
        "labels": {
            "app.juju.is/created-by": "traefik",
            "app.kubernetes.io/managed-by": "juju",
            "app.kubernetes.io/name": "traefik"
        },
        "name": "traefik",
        "namespace": "cos",
        "resourceVersion": "4290",
        "uid": "999b4ecf-37a2-4ec4-a76d-837cc421d1e1"
    },
    "spec": {
        "allocateLoadBalancerNodePorts": true,
        "clusterIP": "10.152.183.180",
        "clusterIPs": [
            "10.152.183.180"
        ],
        "externalTrafficPolicy": "Cluster",
        "internalTrafficPolicy": "Cluster",
        "ipFamilies": [
            "IPv4"
        ],
        "ipFamilyPolicy": "SingleStack",
        "ports": [
            {
                "name": "traefik",
                "nodePort": 31859,
                "port": 80,
                "protocol": "TCP",
                "targetPort": 80
            },
            {
                "name": "traefik-tls",
                "nodePort": 31320,
                "port": 443,
                "protocol": "TCP",
                "targetPort": 443
            }
        ],
        "selector": {
            "app.kubernetes.io/name": "traefik"
        },
        "sessionAffinity": "None",
        "type": "LoadBalancer"
    },
    "status": {
        "loadBalancer": {
            "ingress": [
                {
                    "ip": "10.246.167.196"
                }
            ]
        }
    }
}

and on AWS using ELB the output is:

$ kubectl --kubeconfig kube.conf get svc -n cos traefik --output json
{
    "apiVersion": "v1",
    "kind": "Service",
    "metadata": {
        "annotations": {
            "controller.juju.is/id": "c2a63cbe-4c3b-484d-8b61-c289f2260d5e",
            "juju.is/version": "3.3.5",
            "model.juju.is/id": "228b8510-e021-45b5-8b09-e4916256b514"
        },
        "creationTimestamp": "2024-06-27T17:45:42Z",
        "finalizers": [
            "service.kubernetes.io/load-balancer-cleanup"
        ],
        "labels": {
            "app.juju.is/created-by": "traefik",
            "app.kubernetes.io/managed-by": "juju",
            "app.kubernetes.io/name": "traefik"
        },
        "name": "traefik",
        "namespace": "cos",
        "resourceVersion": "7773",
        "uid": "b23f47a6-3538-4318-9715-5b2f26ca8a4a"
    },
    "spec": {
        "allocateLoadBalancerNodePorts": true,
        "clusterIP": "10.152.183.194",
        "clusterIPs": [
            "10.152.183.194"
        ],
        "externalTrafficPolicy": "Cluster",
        "internalTrafficPolicy": "Cluster",
        "ipFamilies": [
            "IPv4"
        ],
        "ipFamilyPolicy": "SingleStack",
        "ports": [
            {
                "name": "traefik",
                "nodePort": 32626,
                "port": 80,
                "protocol": "TCP",
                "targetPort": 80
            },
            {
                "name": "traefik-tls",
                "nodePort": 31108,
                "port": 443,
                "protocol": "TCP",
                "targetPort": 443
            }
        ],
        "selector": {
            "app.kubernetes.io/name": "traefik"
        },
        "sessionAffinity": "None",
        "type": "LoadBalancer"
    },
    "status": {
        "loadBalancer": {
            "ingress": [
                {
                    "hostname": "ab23f47a63538431897155b2f26ca8a4-1618134996.us-east-1.elb.amazonaws.com"
                }
            ]
        }
    }
}

Additional context

While COS isn't usually set to run on charmed k8s, and doesn't have many uses on AWS at the moment. COS is a good testing workload for juju and charmed k8s releases. Since traefik is used more and more as a standard ingress operator for Canonical it makes sense that it can work in most (all) environments.

Abuelodelanada commented 2 months ago

Hello @asbalderson !

I have published a PR for this

Are you able to test it using latest/edge/fix378??

jeffreychang911 commented 2 months ago

It works in SolQA AWS env.

Model Controller Cloud/Region Version SLA Timestamp cos foundations-k8s kubernetes_cloud/us-east-1 3.5.2 unsupported 20:51:25Z

App Version Status Scale Charm Channel Rev Address Exposed Message alertmanager 0.27.0 active 1 alertmanager-k8s latest/candidate 124 10.152.183.101 no catalogue active 1 catalogue-k8s latest/candidate 59 10.152.183.208 no grafana 9.5.3 active 1 grafana-k8s latest/candidate 117 10.152.183.219 no loki 2.9.6 active 1 loki-k8s latest/candidate 158 10.152.183.220 no prometheus 2.52.0 active 1 prometheus-k8s latest/candidate 209 10.152.183.49 no traefik 2.11.0 active 1 traefik-k8s latest/edge/fix378 202 10.152.183.40 no Serving at a1fe861d3f36d4590b4b683d2424bc72-20394484.us-east-1.elb.amazonaws.com

Unit Workload Agent Address Ports Message alertmanager/0 active idle 192.168.20.205 catalogue/0 active idle 192.168.59.140 grafana/0 active idle 192.168.59.143 loki/0 active idle 192.168.20.206 prometheus/0 active idle 192.168.59.144 traefik/0 active idle 192.168.20.207 Serving at a1fe861d3f36d4590b4b683d2424bc72-20394484.us-east-1.elb.amazonaws.com