cilium / hubble

Hubble - Network, Service & Security Observability for Kubernetes using eBPF
Apache License 2.0
3.41k stars 246 forks source link

Hubble loses the FQDN reference after a few moments after it is resolved #1438

Open rooque opened 3 months ago

rooque commented 3 months ago

I'm having some strange behavior with FQDN destination in Hubble. I have 2 FQDNs, that 2 pods connects :

redis.sandbox.whitelabel.com.br
db.sandbox.whitelabel.com.br

At first, the FQDN appears both in the service map and in the list correctly. After a few moments, it starts treating the FQDNs' IPs as "world". (see images)

Images

Captura de Tela 2024-03-28 às 15 36 20 Captura de Tela 2024-03-28 às 15 30 07 Captura de Tela 2024-03-28 às 15 29 09 Captura de Tela 2024-03-28 às 15 28 58 Captura de Tela 2024-03-28 às 15 28 17

The problem is not just in the UI, in the CLI is the same

Apr  2 15:56:45.180: services/poc-bff-66f46654c-7565m:45148 (ID:103419) -> redis.sandbox.whitelabel.com.br:6379 (ID:16777220) to-stack FORWARDED (TCP Flags: ACK)
Apr  2 15:56:45.180: services/poc-bff-66f46654c-7565m:45148 (ID:103419) <- redis.sandbox.whitelabel.com.br:6379 (ID:16777220) to-endpoint FORWARDED (TCP Flags: ACK)
Apr  2 15:56:49.724: services/poc-microservice-595c9fb49b-gcwvh:58090 (ID:82048) -> redis.sandbox.whitelabel.com.br:6379 (ID:16777220) to-stack FORWARDED (TCP Flags: ACK)
Apr  2 15:56:49.724: services/poc-microservice-595c9fb49b-gcwvh:58090 (ID:82048) <- redis.sandbox.whitelabel.com.br:6379 (ID:16777220) to-endpoint FORWARDED (TCP Flags: ACK)
Apr  2 15:57:05.276: services/poc-bff-66f46654c-7565m:45148 (ID:103419) -> redis.sandbox.whitelabel.com.br:6379 (ID:16777220) to-stack FORWARDED (TCP Flags: ACK)
Apr  2 15:57:05.276: services/poc-bff-66f46654c-7565m:45148 (ID:103419) <- redis.sandbox.whitelabel.com.br:6379 (ID:16777220) to-endpoint FORWARDED (TCP Flags: ACK)

But after some time (30 seconds):

Apr  2 15:57:40.028: services/poc-microservice-595c9fb49b-gcwvh:58090 (ID:82048) -> 10.6.132.252:6379 (ID:16777220) to-stack FORWARDED (TCP Flags: ACK)
Apr  2 15:57:40.028: services/poc-bff-66f46654c-7565m:45148 (ID:103419) -> 10.6.132.252:6379 (ID:16777220) to-stack FORWARDED (TCP Flags: ACK)
Apr  2 15:57:40.028: services/poc-microservice-595c9fb49b-gcwvh:58090 (ID:82048) <- 10.6.132.252:6379 (ID:16777220) to-endpoint FORWARDED (TCP Flags: ACK)
Apr  2 15:57:40.028: services/poc-bff-66f46654c-7565m:45148 (ID:103419) <- 10.6.132.252:6379 (ID:16777220) to-endpoint FORWARDED (TCP Flags: ACK)

NetworkPolicy

 apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: bff-rule
  namespace: services
spec:
  endpointSelector:
    matchLabels:
      app: poc-bff
  ingress:
    - fromEndpoints:
        - {}
    - fromEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: envoy-gateway-system
  egress:
    - toEndpoints:
        - {}
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: kube-system
            k8s-app: kube-dns
      toPorts:
        - ports:
            - port: "53"
              protocol: ANY
          rules:
            dns:
              - matchPattern: "*"
    - toEndpoints:
        - matchLabels:
            io.kubernetes.pod.namespace: observability
    - toFQDNs:
        - matchPattern: "*.sandbox.whitelabel.com.br"

The FQDNs shows in this lists, always even after it starting showing as WORLD

 # cilium-dbg fqdn cache list
Endpoint   Source       FQDN                               TTL   ExpirationTime             IPs            
1117       connection   redis.sandbox.whitelabel.com.br.   0     2024-04-02T14:16:09.553Z   10.6.132.252   
1459       connection   redis.sandbox.whitelabel.com.br.   0     2024-04-02T14:16:09.553Z   10.6.132.252   
1459       connection   db.sandbox.whitelabel.com.br.      0     2024-04-02T14:16:09.553Z   10.45.48.3 

Also: Every time I restart the pods, I shows the FQDNs for a very short time... Then it start showing as World again.

Cilium Config

Client: 1.15.2 7cf57829 2024-03-13T15:34:43+02:00 go version go1.21.8 linux/amd64 Daemon: 1.15.2 7cf57829 2024-03-13T15:34:43+02:00 go version go1.21.8 linux/amd64 ```yaml agent-not-ready-taint-key: node.cilium.io/agent-not-ready arping-refresh-period: 30s auto-direct-node-routes: 'false' bpf-lb-acceleration: disabled bpf-lb-external-clusterip: 'false' bpf-lb-map-max: '65536' bpf-lb-sock: 'false' bpf-map-dynamic-size-ratio: '0.0025' bpf-policy-map-max: '16384' bpf-root: /sys/fs/bpf cgroup-root: /run/cilium/cgroupv2 cilium-endpoint-gc-interval: 5m0s cluster-id: '1' cluster-name: gke-1 cni-exclusive: 'true' cni-log-file: /var/run/cilium/cilium-cni.log controller-group-metrics: write-cni-file sync-host-ips sync-lb-maps-with-k8s-services custom-cni-conf: 'false' debug: 'false' debug-verbose: '' dnsproxy-enable-transparent-mode: 'true' egress-gateway-reconciliation-trigger-interval: 1s enable-auto-protect-node-port-range: 'true' enable-bgp-control-plane: 'false' enable-bpf-clock-probe: 'false' enable-endpoint-health-checking: 'true' enable-endpoint-routes: 'true' enable-envoy-config: 'true' enable-external-ips: 'false' enable-health-check-loadbalancer-ip: 'true' enable-health-check-nodeport: 'true' enable-health-checking: 'true' enable-host-port: 'false' enable-hubble: 'true' enable-hubble-open-metrics: 'true' enable-ipv4: 'true' enable-ipv4-big-tcp: 'false' enable-ipv4-masquerade: 'true' enable-ipv6: 'false' enable-ipv6-big-tcp: 'false' enable-ipv6-masquerade: 'true' enable-k8s-networkpolicy: 'true' enable-k8s-terminating-endpoint: 'true' enable-l2-neigh-discovery: 'true' enable-l7-proxy: 'true' enable-local-redirect-policy: 'false' enable-masquerade-to-route-source: 'false' enable-metrics: 'true' enable-node-port: 'false' enable-policy: default enable-remote-node-identity: 'true' enable-sctp: 'false' enable-svc-source-range-check: 'true' enable-vtep: 'false' enable-well-known-identities: 'false' enable-wireguard: 'true' enable-xt-socket-fallback: 'true' external-envoy-proxy: 'true' hubble-disable-tls: 'false' hubble-export-file-max-backups: '5' hubble-export-file-max-size-mb: '10' hubble-listen-address: ':4244' hubble-metrics: >- dns drop tcp flow port-distribution icmp httpV2:exemplars=true;labelsContext=source_ip,source_namespace,source_workload,destination_ip,destination_namespace,destination_workload,traffic_direction hubble-metrics-server: ':9965' hubble-socket-path: /var/run/cilium/hubble.sock hubble-tls-cert-file: /var/lib/cilium/tls/hubble/server.crt hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key identity-allocation-mode: crd identity-gc-interval: 15m0s identity-heartbeat-timeout: 30m0s install-no-conntrack-iptables-rules: 'false' ipam: kubernetes ipam-cilium-node-update-rate: 15s ipv4-native-routing-cidr: 10.0.0.0/18 k8s-client-burst: '20' k8s-client-qps: '10' kube-proxy-replacement: 'false' kube-proxy-replacement-healthz-bind-address: '' loadbalancer-l7: envoy loadbalancer-l7-algorithm: round_robin loadbalancer-l7-ports: '' max-connected-clusters: '255' mesh-auth-enabled: 'true' mesh-auth-gc-interval: 5m0s mesh-auth-queue-size: '1024' mesh-auth-rotated-identities-queue-size: '1024' monitor-aggregation: medium monitor-aggregation-flags: all monitor-aggregation-interval: 5s node-port-bind-protection: 'true' nodes-gc-interval: 5m0s operator-api-serve-addr: 127.0.0.1:9234 operator-prometheus-serve-addr: ':9963' policy-cidr-match-mode: '' preallocate-bpf-maps: 'false' procfs: /host/proc prometheus-serve-addr: ':9962' proxy-connect-timeout: '2' proxy-max-connection-duration-seconds: '0' proxy-max-requests-per-connection: '0' remove-cilium-node-taints: 'true' routing-mode: native service-no-backend-response: reject set-cilium-is-up-condition: 'true' set-cilium-node-taints: 'true' sidecar-istio-proxy-image: cilium/istio_proxy skip-cnp-status-startup-clean: 'false' synchronize-k8s-nodes: 'true' tofqdns-dns-reject-response-code: refused tofqdns-enable-dns-compression: 'true' tofqdns-endpoint-max-ip-per-hostname: '50' tofqdns-idle-connection-grace-period: 0s tofqdns-max-deferred-connection-deletes: '10000' tofqdns-proxy-response-max-delay: 100ms unmanaged-pod-watcher-interval: '15' vtep-cidr: '' vtep-endpoint: '' vtep-mac: '' vtep-mask: '' wireguard-persistent-keepalive: 0s write-cni-conf-when-ready: /host/etc/cni/net.d/05-cilium.conflist ```

SysDump

[cilium-sysdump-20240403-174046.zip](https://github.com/cilium/hubble/files/14858310/cilium-sysdump-20240403-174046.zip)

saintdle commented 3 months ago

Hi, I spent some time with some of the cilium maintainers to understand this issue better. Here's what has been found so far.

The fdqn cache keeps two kinds of records, lookups which is where a name is resolved to an IP from an external DNS server, this comes with a TTL record. And connections track active connections in the datapath from pods, but these do not have a TTL set.

In your fqdn cache it seems like there are no lookups because the TTL has expired, however, you still have connections because the application is still communicating externally. We expect that your application is keeping a long-lived connection and not making a subsequent lookup again, we couldn't see any DNS lookups from the pods in the hubble flows from the sysdump, and there are no SYN flows in the flows either to those identities.

Below is a quick capture from my lap showing the lookup records in the fdqn cache.

 k exec -n kube-system cilium-7hdqd -it -- cilium-dbg fqdn cache list
Defaulted container "cilium-agent" out of: cilium-agent, config (init), mount-bpf-fs (init), clean-cilium-state (init), install-cni-binaries (init)
Endpoint   Source       FQDN                                                                     TTL   ExpirationTime             IPs              
746        connection   jobs-app-kafka-brokers.tenant-jobs.svc.cluster.local.                    0     0001-01-01T00:00:00.000Z   10.244.2.48      
1279       lookup       api.github.com.                                                          2     2024-04-04T15:48:23.486Z   140.82.121.5     
1279       connection   loader.tenant-jobs.svc.cluster.local.                                    0     0001-01-01T00:00:00.000Z   10.109.69.105    
1279       connection   api.github.com.                                                          0     0001-01-01T00:00:00.000Z   140.82.121.6     
602        connection   elasticsearch-master.tenant-jobs.svc.cluster.local.                      0     0001-01-01T00:00:00.000Z   10.102.40.97     
350        connection   jobs-app-zookeeper-client.tenant-jobs.svc.cluster.local.                 0     0001-01-01T00:00:00.000Z   10.104.222.238   
350        connection   jobs-app-kafka-0.jobs-app-kafka-brokers.tenant-jobs.svc.cluster.local.   0     0001-01-01T00:00:00.000Z   10.244.2.48      

We looked into the following hubble code which pulls the IP/FQDN from the cache.

At the moment it seems like the behaviour is working as expected, or rather coded. However, we think there is an opportunity to improve this behaviour.

Using the lookup field looks like the right thing to do, because using the connection field, there is no guarantee that the FQDN and IP remain the same throughout a long-lived connection, the DNS record could be updated during that time to a new IP address.

However, because of that, we find the situation you have logged arises. We could fall back to using the connection item if there is no lookup item, and flag this in the hubble observe output command in some way, so that you know it's a best effort FQDN print out.

That would look something like this (example with an *):

Apr  2 15:56:45.180: services/poc-bff-66f46654c-7565m:45148 (ID:103419) -> redis.sandbox.whitelabel.com.br*:6379 (ID:16777220) to-stack FORWARDED (TCP Flags: ACK)

In this scenario where the TTL has expired for the lookup item in the cache, what would you like to happen?

rooque commented 3 months ago

Hello @saintdle !

What I expect is to see those FQDNs in hubble and not "world", even for long lived connections. If the lookup is expired but there still a connection, it should show the FQDN and not "world".

That's make sense?

ps. sorry for the delay.

saintdle commented 3 months ago

Hello @saintdle !

What I expect is to see those FQDNs in hubble and not "world", even for long lived connections. If the lookup is expired but there still a connection, it should show the FQDN and not "world".

That's make sense?

ps. sorry for the delay.

Yes sure, so in that case, when the FQDN is shown but it's from a connection, as the TTL has expired, then it would be best to mark it as such.

macmiranda commented 3 months ago

Hey @rooque just wondering if you have any issues with egress toFQDNs policies because of that. I see the same behavior on Hubble and when I tried to create a CNP using matchName I started getting dropped packets, though the hostname was allowed by the policy. Not saying it's related but it sorta makes sense that cilium would drop reserved:world packets.

saintdle commented 1 month ago

@rooque I stumbled on this in the docs, maybe it's useful for this use case currently https://docs.cilium.io/en/latest/contributing/development/debugging/#unintended-dns-policy-drops