cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
19.94k stars 2.93k forks source link

CFP: add topology labels to flows to allow filtering - example cross-zone traffic #33601

Open youwalther65 opened 3 months ago

youwalther65 commented 3 months ago

Cilium Feature Proposal

Is your proposed feature related to a problem?

For this example I will use AWS and Amazon EKS naming conventions. Similar concepts might apply to other cloud vendors as well: For resiliency creating EKS worker nodes in multiple availability zones (AZ) is a best practice. Traffic from pod to pod most likely will pass AZ. But cross-AZ traffic incur some cost. Especially in Cilium overlay networking visibility into cross-AZ traffic is not possible with AWS features like VPC Flow logs. Even Hubble CLI/UI missing this feature currently.

Describe the feature you'd like

EKS worker nodes come with K8s well-known label topology.kubernetes.io/zone by default, example topology.kubernetes.io/zone: eu-west-1b. Embedding this information into flows as labels would make it possible to use Hubble queries with -from-label and --to-labeland allow to see flows crossing AZ. Using this one could identify potential applications. I am still not sure if it is possible to count the packets/bytes to have a view into total traffic/time somehow which would help identifying top talkers.

nvibert commented 3 months ago

Good idea and would be useful for any topology-aware mechanisms, including the recent 1.30 Service Traffic Distribution.

youwalther65 commented 2 months ago

According to 1.16 release blog section Filtering Hubble flows by node labels filtering by node label topology.kubernetes.io/zone is now possible.

rstoermer commented 1 month ago

Seeing the node label in the CLI JSON output, for example

{
    "flow": {
        "time": "2024-08-20T13:44:39.452593052Z",
        "uuid": "87b881cf-6afd-4b81-9c0e-a095ba251769",
        "verdict": "FORWARDED",
        "ethernet": {
            "source": "b6:f8:6f:cf:05:30",
            "destination": "86:32:eb:cf:a9:f7"
        },
        "IP": {
            "source": "10.128.0.158",
            "destination": "10.128.2.85",
            "ipVersion": "IPv4"
        },
        "l4": {
            "UDP": {
                "source_port": 41470,
                "destination_port": 53
            }
        },
        "source": {
            "identity": 9456,
            "cluster_name": "kind-cilium-migration",
            "namespace": "default",
            "labels": [
                "k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default",
                "k8s:io.cilium.k8s.policy.cluster=kind-cilium-migration",
                "k8s:io.cilium.k8s.policy.serviceaccount=default",
                "k8s:io.kubernetes.pod.namespace=default",
                "k8s:run=tmp-shell"
            ],
            "pod_name": "tmp-shell"
        },
        "destination": {
            "ID": 4071,
            "identity": 21731,
            "namespace": "kube-system",
            "labels": [
                "k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system",
                "k8s:io.cilium.k8s.policy.cluster=kind-cilium-migration",
                "k8s:io.cilium.k8s.policy.serviceaccount=coredns",
                "k8s:io.kubernetes.pod.namespace=kube-system",
                "k8s:k8s-app=kube-dns"
            ],
            "pod_name": "coredns-7db6d8ff4d-hvmhb",
            "workloads": [
                {
                    "name": "coredns",
                    "kind": "Deployment"
                }
            ]
        },
        "Type": "L3_L4",
        "node_name": "kind-cilium-migration/cilium-migration-worker2",
        "node_labels": [
            "beta.kubernetes.io/arch=arm64",
            "beta.kubernetes.io/os=linux",
            "io.cilium.migration/cilium-default=true",
            "kubernetes.io/arch=arm64",
            "kubernetes.io/hostname=cilium-migration-worker2",
            "kubernetes.io/os=linux",
            "topology.kubernetes.io/zone=eu-central-1b"
        ],
        "event_type": {
            "type": 4
        },
        "traffic_direction": "EGRESS",
        "trace_observation_point": "TO_ENDPOINT",
        "trace_reason": "NEW",
        "is_reply": false,
        "interface": {
            "index": 33,
            "name": "lxc0bf9b51842a1"
        },
        "Summary": "UDP"
    },
    "node_name": "kind-cilium-migration/cilium-migration-worker2",
    "time": "2024-08-20T13:44:39.452593052Z"
}

what does the flow.node_labels refer to, the destination or source?

coredns-7db6d8ff4d-hvmhb runs on cilium-migration-worker2 with label topology.kubernetes.io/zone=eu-central-1b

tmp-shell runs on cilium-migration-worker with label topology.kubernetes.io/zone=eu-central-1a

so I would assume that the given node_label, as it states topology.kubernetes.io/zone=eu-central-1b is that of the destination.

However when looking at the reply, the label is the same, although the destination is in eu-central-1a:

{
    "flow": {
        "time": "2024-08-20T13:44:39.455688552Z",
        "uuid": "bcd4672c-0b6f-4a0a-9b37-e6fe481bfeb2",
        "verdict": "FORWARDED",
        "ethernet": {
            "source": "86:32:eb:cf:a9:f7",
            "destination": "b6:f8:6f:cf:05:30"
        },
        "IP": {
            "source": "10.128.2.85",
            "destination": "10.128.0.158",
            "ipVersion": "IPv4"
        },
        "l4": {
            "UDP": {
                "source_port": 53,
                "destination_port": 41470
            }
        },
        "source": {
            "ID": 4071,
            "identity": 21731,
            "namespace": "kube-system",
            "labels": [
                "k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system",
                "k8s:io.cilium.k8s.policy.cluster=kind-cilium-migration",
                "k8s:io.cilium.k8s.policy.serviceaccount=coredns",
                "k8s:io.kubernetes.pod.namespace=kube-system",
                "k8s:k8s-app=kube-dns"
            ],
            "pod_name": "coredns-7db6d8ff4d-hvmhb",
            "workloads": [
                {
                    "name": "coredns",
                    "kind": "Deployment"
                }
            ]
        },
        "destination": {
            "identity": 9456,
            "cluster_name": "kind-cilium-migration",
            "namespace": "default",
            "labels": [
                "k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default",
                "k8s:io.cilium.k8s.policy.cluster=kind-cilium-migration",
                "k8s:io.cilium.k8s.policy.serviceaccount=default",
                "k8s:io.kubernetes.pod.namespace=default",
                "k8s:run=tmp-shell"
            ],
            "pod_name": "tmp-shell"
        },
        "Type": "L3_L4",
        "node_name": "kind-cilium-migration/cilium-migration-worker2",
        "node_labels": [
            "beta.kubernetes.io/arch=arm64",
            "beta.kubernetes.io/os=linux",
            "io.cilium.migration/cilium-default=true",
            "kubernetes.io/arch=arm64",
            "kubernetes.io/hostname=cilium-migration-worker2",
            "kubernetes.io/os=linux",
            "topology.kubernetes.io/zone=eu-central-1b"
        ],
        "reply": true,
        "event_type": {
            "type": 4,
            "sub_type": 4
        },
        "traffic_direction": "INGRESS",
        "trace_observation_point": "TO_OVERLAY",
        "trace_reason": "REPLY",
        "is_reply": true,
        "interface": {
            "index": 15,
            "name": "cilium_vxlan"
        },
        "Summary": "UDP"
    },
    "node_name": "kind-cilium-migration/cilium-migration-worker2",
    "time": "2024-08-20T13:44:39.455688552Z"
}

Traffic to and from the public internet also contains the node_label both times:

{
    "flow": {
        "time": "2024-08-20T13:42:18.455358584Z",
        "uuid": "d5dab447-5c3f-4cb5-ac6e-3ee28b20461e",
        "verdict": "FORWARDED",
        "ethernet": {
            "source": "56:0c:5b:59:0a:1f",
            "destination": "ce:b5:4e:66:6c:6b"
        },
        "IP": {
            "source": "10.128.0.158",
            "destination": "142.250.181.195",
            "ipVersion": "IPv4"
        },
        "l4": {
            "TCP": {
                "source_port": 56304,
                "destination_port": 80,
                "flags": {
                    "SYN": true
                }
            }
        },
        "source": {
            "ID": 1703,
            "identity": 9456,
            "namespace": "default",
            "labels": [
                "k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default",
                "k8s:io.cilium.k8s.policy.cluster=kind-cilium-migration",
                "k8s:io.cilium.k8s.policy.serviceaccount=default",
                "k8s:io.kubernetes.pod.namespace=default",
                "k8s:run=tmp-shell"
            ],
            "pod_name": "tmp-shell"
        },
        "destination": {
            "identity": 2,
            "labels": [
                "reserved:world"
            ]
        },
        "Type": "L3_L4",
        "node_name": "kind-cilium-migration/cilium-migration-worker",
        "node_labels": [
            "beta.kubernetes.io/arch=arm64",
            "beta.kubernetes.io/os=linux",
            "io.cilium.migration/cilium-default=true",
            "kubernetes.io/arch=arm64",
            "kubernetes.io/hostname=cilium-migration-worker",
            "kubernetes.io/os=linux",
            "topology.kubernetes.io/zone=eu-central-1a"
        ],
        "event_type": {
            "type": 4,
            "sub_type": 3
        },
        "traffic_direction": "EGRESS",
        "trace_observation_point": "TO_STACK",
        "trace_reason": "NEW",
        "is_reply": false,
        "Summary": "TCP Flags: SYN"
    },
    "node_name": "kind-cilium-migration/cilium-migration-worker",
    "time": "2024-08-20T13:42:18.455358584Z"
}
{
    "flow": {
        "time": "2024-08-20T13:42:18.456015417Z",
        "uuid": "0736d96c-ac3e-4c64-89f7-cee617a3d840",
        "verdict": "FORWARDED",
        "ethernet": {
            "source": "ce:b5:4e:66:6c:6b",
            "destination": "56:0c:5b:59:0a:1f"
        },
        "IP": {
            "source": "142.250.181.195",
            "destination": "10.128.0.158",
            "ipVersion": "IPv4"
        },
        "l4": {
            "TCP": {
                "source_port": 80,
                "destination_port": 56304,
                "flags": {
                    "SYN": true,
                    "ACK": true
                }
            }
        },
        "source": {
            "identity": 2,
            "labels": [
                "reserved:world"
            ]
        },
        "destination": {
            "ID": 1703,
            "identity": 9456,
            "namespace": "default",
            "labels": [
                "k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=default",
                "k8s:io.cilium.k8s.policy.cluster=kind-cilium-migration",
                "k8s:io.cilium.k8s.policy.serviceaccount=default",
                "k8s:io.kubernetes.pod.namespace=default",
                "k8s:run=tmp-shell"
            ],
            "pod_name": "tmp-shell"
        },
        "Type": "L3_L4",
        "node_name": "kind-cilium-migration/cilium-migration-worker",
        "node_labels": [
            "beta.kubernetes.io/arch=arm64",
            "beta.kubernetes.io/os=linux",
            "io.cilium.migration/cilium-default=true",
            "kubernetes.io/arch=arm64",
            "kubernetes.io/hostname=cilium-migration-worker",
            "kubernetes.io/os=linux",
            "topology.kubernetes.io/zone=eu-central-1a"
        ],
        "reply": true,
        "event_type": {
            "type": 4
        },
        "traffic_direction": "EGRESS",
        "trace_observation_point": "TO_ENDPOINT",
        "trace_reason": "REPLY",
        "is_reply": true,
        "interface": {
            "index": 39,
            "name": "lxcd05d3f2b91a5"
        },
        "Summary": "TCP Flags: SYN, ACK"
    },
    "node_name": "kind-cilium-migration/cilium-migration-worker",
    "time": "2024-08-20T13:42:18.456015417Z"
}

Any help interpreting this? Ideally I want to be able to export those logs to analyse traffic going from node in AZ A to nodes in AZ B, or traffic coming and going to the public internet.

rstoermer commented 1 month ago

Alright #34133 provides some guidance on interpretation and the feature seems like just we I need, so I will follow this issue closely. Thanks for the great work! :)