cilium / cilium-cli

CLI to install, manage & troubleshoot Kubernetes clusters running Cilium
https://cilium.io
Apache License 2.0
398 stars 200 forks source link

Connectivity Test: AKS Cluster failing node-to-node-encryption #2105

Open jpayne3506 opened 8 months ago

jpayne3506 commented 8 months ago

Bug report

General Information This result occurs on 3 different agent versions: v1.12.10, v1.4.3, v1.15.0 (nightly release)

How to reproduce the issue

  1. instruction 1 cilium connectivity test --test node-to-node-encryption -d -v Result: [-] Scenario [node-to-node-encryption/node-to-node-encryption] šŸ› node-to-node-encryption test running in sanity mode, expecting unencrypted packets šŸ› Running /bin/sh -c ip -o route get 10.10.0.5 from 192.168.1.94 iif lo | grep -oE 'dev [^ ]' | cut -d' ' -f2 šŸ› Running in bg: tcpdump -i eth0 --immediate-mode -w /tmp/node-to-node-encryption-host-netns-zh6p2.pcap src host 192.168.1.94 and icmp and dst host 10.10.0.5 [.] Action [node-to-node-encryption/node-to-node-encryption/ping-ipv4: cilium-test/client-c4bfddc44-24nrv (192.168.1.94) -> cilium-test/host-netns-jm8gf (10.10.0.5:0)] šŸ› Executing command [ping -c 1 -W 2 -w 10 10.10.0.5] āŒ Expected to see unencrypted packets, but none found. This check might be broken šŸ“„ No flows recorded for peer cilium-test/client-c4bfddc44-24nrv during action ping-ipv4 šŸ“„ No flows recorded for peer cilium-test/host-netns-jm8gf during action ping-ipv4 šŸ› Running /bin/sh -c ip -o route get 10.10.0.5 from 10.10.0.4 | grep -oE 'dev [^ ]' | cut -d' ' -f2 šŸ› Running in bg: tcpdump -i eth0 --immediate-mode -w /tmp/node-to-node-encryption-host-netns-zh6p2.pcap src host 10.10.0.4 and icmp and dst host 10.10.0.5 šŸ› Running /bin/sh -c ip -o route get 10.10.0.4 from 10.10.0.5 | grep -oE 'dev [^ ]' | cut -d' ' -f2 šŸ› Running in bg: tcpdump -i eth0 --immediate-mode -w /tmp/node-to-node-encryption-host-netns-jm8gf.pcap src host 10.10.0.5 and icmp and dst host 10.10.0.4 [.] Action [node-to-node-encryption/node-to-node-encryption/ping-ipv4: cilium-test/host-netns-zh6p2 (10.10.0.4) -> cilium-test/host-netns-jm8gf (10.10.0.5:0)] šŸ› Executing command [ping -c 1 -W 2 -w 10 10.10.0.5] šŸ› Running /bin/sh -c ip -o route get 192.168.0.107 from 10.10.0.4 | grep -oE 'dev [^ ]' | cut -d' ' -f2 šŸ› Running in bg: tcpdump -i eth0 --immediate-mode -w /tmp/node-to-node-encryption-host-netns-zh6p2.pcap src host 10.10.0.4 and tcp and dst host 192.168.0.107 [.] Action [node-to-node-encryption/node-to-node-encryption/curl-ipv4: cilium-test/host-netns-zh6p2 (10.10.0.4) -> cilium-test/echo-other-node-64b998965-6d7pm (192.168.0.107:8080)] šŸ› Executing command [curl -w %{local_ip}:%{local_port} -> %{remote_ip}:%{remote_port} = %{response_code} --silent --fail --show-error --output /dev/null --connect-timeout 2 --max-time 10 http://192.168.0.107:8080] šŸ› Finalizing Test node-to-node-encryption

Features Enabled/General system information: šŸ› Detected features: šŸ› cidr-match-nodes: Disabled šŸ› cilium-network-policy: Enabled šŸ› cni-chaining: Disabled:none šŸ› enable-envoy-config: Disabled šŸ› enable-gateway-api: Disabled šŸ› enable-ipv4-egress-gateway: Disabled šŸ› encryption-node: Disabled šŸ› encryption-pod: Disabled:disabled šŸ› endpoint-routes: Enabled šŸ› flavor: Enabled:aks šŸ› health-checking: Disabled šŸ› host-firewall: Disabled šŸ› host-port: Enabled šŸ› icmp-policy: Enabled šŸ› ingress-controller: Disabled šŸ› ipv4: Enabled šŸ› ipv6: Disabled šŸ› k8s-network-policy: Enabled šŸ› kpr-external-ips: Enabled šŸ› kpr-graceful-termination: Enabled šŸ› kpr-hostport: Enabled šŸ› kpr-mode: Enabled:Strict šŸ› kpr-nodeport: Enabled šŸ› kpr-session-affinity: Enabled šŸ› kpr-socket-lb: Enabled šŸ› l7-proxy: Disabled šŸ› monitor-aggregation: Enabled:medium šŸ› mutual-auth-spiffe: Disabled šŸ› node-without-cilium: Disabled šŸ› secret-backend-k8s: Disabled šŸ› tunnel: Disabled:vxlan šŸ› wireguard-encapsulate: Disabled ā„¹ļø Monitor aggregation detected, will skip some flow validation steps ā„¹ļø Skipping tests that require a node Without Cilium šŸ› Validating Deployments... āŒ› [ciliumnightly-816c6c11] Waiting for deployment cilium-test/client to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for deployment cilium-test/client2 to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for deployment cilium-test/echo-same-node to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for deployment cilium-test/echo-other-node to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for CiliumEndpoint for pod cilium-test/client-c4bfddc44-24nrv to appear... āŒ› [ciliumnightly-816c6c11] Waiting for CiliumEndpoint for pod cilium-test/client2-5c6c769648-c6jzw to appear... āŒ› [ciliumnightly-816c6c11] Waiting for pod cilium-test/client-c4bfddc44-24nrv to reach DNS server on cilium-test/echo-same-node-5988bfdbc-m7xqx pod... āŒ› [ciliumnightly-816c6c11] Waiting for pod cilium-test/client2-5c6c769648-c6jzw to reach DNS server on cilium-test/echo-same-node-5988bfdbc-m7xqx pod... āŒ› [ciliumnightly-816c6c11] Waiting for pod cilium-test/client-c4bfddc44-24nrv to reach DNS server on cilium-test/echo-other-node-64b998965-6d7pm pod... āŒ› [ciliumnightly-816c6c11] Waiting for pod cilium-test/client2-5c6c769648-c6jzw to reach DNS server on cilium-test/echo-other-node-64b998965-6d7pm pod... āŒ› [ciliumnightly-816c6c11] Waiting for pod cilium-test/client-c4bfddc44-24nrv to reach default/kubernetes service... āŒ› [ciliumnightly-816c6c11] Waiting for pod cilium-test/client2-5c6c769648-c6jzw to reach default/kubernetes service... āŒ› [ciliumnightly-816c6c11] Waiting for CiliumEndpoint for pod cilium-test/echo-other-node-64b998965-6d7pm to appear... āŒ› [ciliumnightly-816c6c11] Waiting for CiliumEndpoint for pod cilium-test/echo-same-node-5988bfdbc-m7xqx to appear... āŒ› [ciliumnightly-816c6c11] Waiting for Service cilium-test/echo-other-node to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for Service cilium-test/echo-other-node to be synchronized by Cilium pod kube-system/cilium-rxbwp āŒ› [ciliumnightly-816c6c11] Waiting for Service cilium-test/echo-same-node to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for Service cilium-test/echo-same-node to be synchronized by Cilium pod kube-system/cilium-rxbwp āŒ› [ciliumnightly-816c6c11] Waiting for NodePort 10.10.0.4:31195 (cilium-test/echo-same-node) to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for NodePort 10.10.0.4:30233 (cilium-test/echo-other-node) to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for NodePort 10.10.0.5:30233 (cilium-test/echo-other-node) to become ready... āŒ› [ciliumnightly-816c6c11] Waiting for NodePort 10.10.0.5:31195 (cilium-test/echo-same-node) to become ready... ā„¹ļø Skipping IPCache check šŸ”­ Enabling Hubble telescope... āš ļø Unable to contact Hubble Relay, disabling Hubble telescope and flow validation: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4245: connect: connection refused" ā„¹ļø Expose Relay locally with: cilium hubble enable cilium hubble port-forward& ā„¹ļø Cilium version: 1.15.0 šŸ› Registered connectivity tests: šŸ› <Test no-policies, 8 scenarios, 0 CNPs, expectFunc > šŸ› <Test no-policies-extra, 2 scenarios, 0 CNPs, expectFunc > šŸ› <Test allow-all-except-world, 5 scenarios, 1 CNPs, expectFunc > šŸ› <Test client-ingress, 1 scenarios, 1 CNPs, expectFunc 0x2752380> šŸ› <Test client-ingress-knp, 1 scenarios, 0 CNPs, expectFunc 0x27525a0> šŸ› <Test allow-all-with-metrics-check, 1 scenarios, 0 CNPs, expectFunc 0x27516a0> šŸ› <Test all-ingress-deny, 2 scenarios, 1 CNPs, expectFunc 0x2751380> šŸ› <Test all-ingress-deny-knp, 2 scenarios, 0 CNPs, expectFunc 0x2750d40> šŸ› <Test all-egress-deny, 2 scenarios, 1 CNPs, expectFunc 0x27527c0> šŸ› <Test all-egress-deny-knp, 2 scenarios, 0 CNPs, expectFunc 0x27528a0> šŸ› <Test all-entities-deny, 2 scenarios, 1 CNPs, expectFunc 0x2752980> šŸ› <Test cluster-entity, 1 scenarios, 1 CNPs, expectFunc 0x2752a60> šŸ› <Test host-entity, 1 scenarios, 1 CNPs, expectFunc 0x2752c20> šŸ› <Test echo-ingress, 1 scenarios, 1 CNPs, expectFunc 0x2752d00> šŸ› <Test echo-ingress-knp, 1 scenarios, 0 CNPs, expectFunc 0x27531c0> šŸ› <Test client-ingress-icmp, 1 scenarios, 1 CNPs, expectFunc 0x2753420> šŸ› <Test client-egress, 1 scenarios, 1 CNPs, expectFunc > šŸ› <Test client-egress-knp, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test client-egress-expression, 1 scenarios, 1 CNPs, expectFunc > šŸ› <Test client-egress-expression-knp, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test client-with-service-account-egress-to-echo, 1 scenarios, 1 CNPs, expectFunc > šŸ› <Test client-egress-to-echo-service-account, 1 scenarios, 1 CNPs, expectFunc 0x2753640> šŸ› <Test to-entities-world, 1 scenarios, 1 CNPs, expectFunc 0x27537e0> šŸ› <Test to-cidr-external, 1 scenarios, 1 CNPs, expectFunc 0x2750980> šŸ› <Test to-cidr-external-knp, 1 scenarios, 0 CNPs, expectFunc 0x2750720> šŸ› <Test echo-ingress-from-other-client-deny, 3 scenarios, 3 CNPs, expectFunc 0x2753a40> šŸ› <Test client-ingress-from-other-client-icmp-deny, 2 scenarios, 3 CNPs, expectFunc 0x2753ca0> šŸ› <Test client-egress-to-echo-deny, 2 scenarios, 3 CNPs, expectFunc 0x2753f00> šŸ› <Test client-ingress-to-echo-named-port-deny, 2 scenarios, 3 CNPs, expectFunc 0x2754180> šŸ› <Test client-egress-to-echo-expression-deny, 2 scenarios, 3 CNPs, expectFunc 0x27543e0> šŸ› <Test client-with-service-account-egress-to-echo-deny, 2 scenarios, 3 CNPs, expectFunc 0x2754640> šŸ› <Test client-egress-to-echo-service-account-deny, 1 scenarios, 3 CNPs, expectFunc 0x27548a0> šŸ› <Test client-egress-to-cidr-deny, 1 scenarios, 2 CNPs, expectFunc 0x27500e0> šŸ› <Test client-egress-to-cidr-deny-default, 1 scenarios, 1 CNPs, expectFunc 0x274fd40> šŸ› <Test health, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test north-south-loadbalancing, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test pod-to-pod-encryption, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test node-to-node-encryption, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test egress-gateway-excluded-cidrs, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test pod-to-node-cidrpolicy, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test north-south-loadbalancing-with-l7-policy, 1 scenarios, 1 CNPs, expectFunc > šŸ› <Test echo-ingress-l7, 1 scenarios, 1 CNPs, expectFunc 0x2754a40> šŸ› <Test echo-ingress-l7-named-port, 1 scenarios, 1 CNPs, expectFunc 0x2754d80> šŸ› <Test client-egress-l7-method, 2 scenarios, 2 CNPs, expectFunc 0x27550c0> šŸ› <Test client-egress-l7, 2 scenarios, 2 CNPs, expectFunc 0x274f6e0> šŸ› <Test client-egress-l7-named-port, 2 scenarios, 2 CNPs, expectFunc 0x274f1a0> šŸ› <Test client-egress-l7-tls-deny-without-headers, 1 scenarios, 1 CNPs, expectFunc 0x27553e0> šŸ› <Test client-egress-l7-tls-headers, 1 scenarios, 1 CNPs, expectFunc 0x27554c0> šŸ› <Test client-egress-l7-set-header, 2 scenarios, 1 CNPs, expectFunc 0x27555a0> šŸ› <Test echo-ingress-auth-always-fail, 1 scenarios, 1 CNPs, expectFunc 0x27557e0> šŸ› <Test echo-ingress-mutual-auth-spiffe, 1 scenarios, 1 CNPs, expectFunc > šŸ› <Test pod-to-ingress-service, 1 scenarios, 0 CNPs, expectFunc > šŸ› <Test pod-to-ingress-service-deny-all, 1 scenarios, 1 CNPs, expectFunc 0x27558c0> šŸ› <Test pod-to-ingress-service-allow-ingress-identity, 1 scenarios, 2 CNPs, expectFunc > šŸ› <Test dns-only, 2 scenarios, 1 CNPs, expectFunc 0x27559a0> šŸ› <Test to-fqdns, 2 scenarios, 1 CNPs, expectFunc 0x274e9c0>

kolovo commented 8 months ago

We are having the same issue on a vanilla installation of k8s with kubespray (v2.23.1) , cilium deployed with helm(v 1.14.3) and cilium-cli: v0.15.14 on baremetal.

giorio94 commented 8 months ago

@jpayne3506 Thanks for the detailed report. I managed to reproduce your issue, which seems to be specific to Azure CNI Powered by Cilium because, differently to what happens normally in native routing configurations, pod to node traffic gets masquerated. In any case, that error is a sanity check of that specific connectivity test, and it is safe to ignore as no encryption is enabled in the cluster.

@kolovo Your issue seems a different one, although possibly related. Could you please provide the output of cilium connectivity test --test node-to-node-encryption -d -v and the sysdump for further investigation?