Closed carlosrejano closed 4 weeks ago
Hi there, thanks for the bug report. It's not yet clear to me how exactly traffic is flowing. Could you outline the expected traffic flow, and indicate where you think it is failing?
In particular, I suggest the section on troubleshooting with hubble to identify where packets are being dropped. Can you go through the troubleshooting section and clarify the problem a bit?
Thanks.
Also can you share your cilium configmap as well? Thanks.
Hi there, thanks for the bug report. It's not yet clear to me how exactly traffic is flowing. Could you outline the expected traffic flow, and indicate where you think it is failing?
In particular, I suggest the section on troubleshooting with hubble to identify where packets are being dropped. Can you go through the troubleshooting section and clarify the problem a bit?
Thanks.
@squeed Hi, sorry for the delay, yes let me explain it better. Correct me if I mention something wrong. The idea is to use Cilium as an Ingress Controller, when I create an ingress object it creates the Classic AWS LB or NLB, tried both, which will balance the traffic to the Cilium ingress controller. If I'm not wrong the component of Cilium that handles the traffic coming from the LB is cilium-envoy which runs inside cilium-agent in my case. The traffic after arriving to cilium-envoy gets sent to the relevant backend of the Ingress. My problem is the communication between the Load Balancer and envoy, the Load Balancer can not target envoy most of the time.
Ask any other question that you need if I still did not explain it well enough.
Thanks for taking a look into this!
Also can you share your cilium configmap as well? Thanks.
@sayboras Yes, here it is:
agent-not-ready-taint-key: node.cilium.io/agent-not-ready
arping-refresh-period: 30s
auto-direct-node-routes: "false"
bpf-lb-acceleration: disabled
bpf-lb-external-clusterip: "false"
bpf-lb-map-max: "65536"
bpf-lb-sock: "false"
bpf-map-dynamic-size-ratio: "0.0025"
bpf-policy-map-max: "16384"
bpf-root: /sys/fs/bpf
cgroup-root: /run/cilium/cgroupv2
cilium-endpoint-gc-interval: 5m0s
cluster-id: "0"
cluster-name: default
cluster-pool-ipv4-cidr: 10.0.0.0/8
cluster-pool-ipv4-mask-size: "24"
cni-chaining-mode: aws-cni
cni-exclusive: "false"
cni-log-file: /var/run/cilium/cilium-cni.log
custom-cni-conf: "false"
debug: "false"
debug-verbose: ""
egress-gateway-reconciliation-trigger-interval: 1s
enable-auto-protect-node-port-range: "true"
enable-bgp-control-plane: "false"
enable-bpf-clock-probe: "false"
enable-endpoint-health-checking: "false"
enable-endpoint-routes: "true"
enable-envoy-config: "true"
enable-external-ips: "false"
enable-gateway-api: "true"
enable-gateway-api-secrets-sync: "true"
enable-health-check-loadbalancer-ip: "false"
enable-health-check-nodeport: "true"
enable-health-checking: "true"
enable-host-legacy-routing: "true"
enable-host-port: "false"
enable-hubble: "true"
enable-ingress-controller: "true"
enable-ingress-proxy-protocol: "false"
enable-ingress-secrets-sync: "true"
enable-ipv4: "true"
enable-ipv4-big-tcp: "false"
enable-ipv4-masquerade: "false"
enable-ipv6: "false"
enable-ipv6-big-tcp: "false"
enable-ipv6-masquerade: "true"
enable-k8s-networkpolicy: "true"
enable-k8s-terminating-endpoint: "true"
enable-l2-neigh-discovery: "true"
enable-l7-proxy: "true"
enable-local-node-route: "false"
enable-local-redirect-policy: "false"
enable-masquerade-to-route-source: "false"
enable-metrics: "true"
enable-node-port: "true"
enable-policy: never
enable-remote-node-identity: "true"
enable-sctp: "false"
enable-svc-source-range-check: "true"
enable-vtep: "false"
enable-well-known-identities: "false"
enable-xt-socket-fallback: "true"
enforce-ingress-https: "true"
external-envoy-proxy: "false"
gateway-api-secrets-namespace: cilium-secrets
hubble-disable-tls: "false"
hubble-export-file-max-backups: "5"
hubble-export-file-max-size-mb: "10"
hubble-listen-address: :4244
hubble-socket-path: /var/run/cilium/hubble.sock
hubble-tls-cert-file: /var/lib/cilium/tls/hubble/server.crt
hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
identity-allocation-mode: crd
identity-gc-interval: 15m0s
identity-heartbeat-timeout: 30m0s
ingress-default-lb-mode: dedicated
ingress-lb-annotation-prefixes: service.beta.kubernetes.io service.kubernetes.io
cloud.google.com
ingress-secrets-namespace: cilium-secrets
ingress-shared-lb-service-name: cilium-ingress
install-no-conntrack-iptables-rules: "false"
ipam: cluster-pool
ipam-cilium-node-update-rate: 15s
k8s-client-burst: "10"
k8s-client-qps: "5"
kube-proxy-replacement: "false"
kube-proxy-replacement-healthz-bind-address: ""
max-connected-clusters: "255"
mesh-auth-enabled: "true"
mesh-auth-gc-interval: 5m0s
mesh-auth-queue-size: "1024"
mesh-auth-rotated-identities-queue-size: "1024"
monitor-aggregation: medium
monitor-aggregation-flags: all
monitor-aggregation-interval: 5s
node-port-bind-protection: "true"
nodes-gc-interval: 5m0s
operator-api-serve-addr: 127.0.0.1:9234
operator-prometheus-serve-addr: :9963
policy-cidr-match-mode: ""
preallocate-bpf-maps: "false"
procfs: /host/proc
proxy-connect-timeout: "2"
proxy-idle-timeout-seconds: "60"
proxy-max-connection-duration-seconds: "0"
proxy-max-requests-per-connection: "0"
proxy-prometheus-port: "9964"
proxy-xff-num-trusted-hops-egress: "0"
proxy-xff-num-trusted-hops-ingress: "0"
remove-cilium-node-taints: "true"
routing-mode: native
service-no-backend-response: reject
set-cilium-is-up-condition: "true"
set-cilium-node-taints: "true"
sidecar-istio-proxy-image: cilium/istio_proxy
skip-cnp-status-startup-clean: "false"
synchronize-k8s-nodes: "true"
tofqdns-dns-reject-response-code: refused
tofqdns-enable-dns-compression: "true"
tofqdns-endpoint-max-ip-per-hostname: "50"
tofqdns-idle-connection-grace-period: 0s
tofqdns-max-deferred-connection-deletes: "10000"
tofqdns-proxy-response-max-delay: 100ms
unmanaged-pod-watcher-interval: "15"
vtep-cidr: ""
vtep-endpoint: ""
vtep-mac: ""
vtep-mask: ""
write-cni-conf-when-ready: /host/etc/cni/net.d/05-cilium.conflist
Thank you for taking a look into this!
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
This issue has not seen any activity since it was marked stale. Closing.
Is there an existing issue for this?
What happened?
We have an EKS cluster where we are trying to use Cilium ingress controller and the load balancer created for the ingress can not always connect to the nodes.
What we see is that the load balancer can connect to some nodes during periods but is not a consistent behavior and there is no pattern between the nodes behind that it can connect and the ones that can not.
Checking directly in the nodes also connecting to the nodePort opened for the load balancer does not work so should not be a problem of security groups, anyway we tried opening traffic from every internal address and nothing, some nodes work and others not or even sometimes no nodes happen to be accessible by the load balancer.
I checked and all the nodes have this cilium LB configuration for the nodePort:
Configuration values used:
cni-config configmap values:
Cilium Version
We tried it in multiple versions:
Kernel Version
Linux 5.10.215-203.850.amzn2.aarch64
Kubernetes Version
v1.26.15
Regression
No response
Sysdump
Relevant log output
No response
Anything else?
No response
Cilium Users Document
Code of Conduct