aws / eks-anywhere

Run Amazon EKS on your own infrastructure 🚀
https://anywhere.eks.amazonaws.com
Apache License 2.0
1.96k stars 285 forks source link

Issue upgrading to cilium 1.10 #2175

Closed jaxesn closed 2 years ago

jaxesn commented 2 years ago

What happened:

When testing using public.ecr.aws/isovalent/cilium:v1.10.8-eksa.1 with eks-a instead of public.ecr.aws/isovalent/cilium:v1.9.13-eksa.2, it looks like we are running into https://github.com/cilium/cilium/issues/18462

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment: Ubuntu version: Ubuntu 20.04.4 LTS (GNU/Linux 5.4.0-110-generic x86_64) uname -a: Linux jgw-vsphere-55v5f 5.4.0-110-generic #124-Ubuntu SMP Thu Apr 14 19:46:19 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

jaxesn commented 2 years ago

CM

  auto-direct-node-routes: "false"
  bpf-lb-external-clusterip: "false"
  bpf-lb-map-max: "65536"
  bpf-map-dynamic-size-ratio: "0.0025"
  bpf-policy-map-max: "16384"
  cgroup-root: /run/cilium/cgroupv2
  cilium-endpoint-gc-interval: 5m0s
  cluster-id: ""
  cluster-name: default
  cni-chaining-mode: portmap
  custom-cni-conf: "false"
  debug: "false"
  disable-cnp-status-updates: "true"
  egress-gateway-healhcheck-timeout: 2s
  enable-auto-protect-node-port-range: "true"
  enable-bandwidth-manager: "false"
  enable-bpf-clock-probe: "true"
  enable-endpoint-health-checking: "true"
  enable-health-check-nodeport: "true"
  enable-health-checking: "true"
  enable-hubble: "true"
  enable-ipv4: "true"
  enable-ipv4-masquerade: "true"
  enable-ipv6: "false"
  enable-ipv6-masquerade: "true"
  enable-l2-neigh-discovery: "true"
  enable-l7-proxy: "true"
  enable-local-redirect-policy: "false"
  enable-metrics: "true"
  enable-policy: default
  enable-remote-node-identity: "true"
  enable-session-affinity: "true"
  enable-well-known-identities: "false"
  enable-xt-socket-fallback: "true"
  hubble-disable-tls: "false"
  hubble-listen-address: :4244
  hubble-socket-path: /var/run/cilium/hubble.sock
  hubble-tls-cert-file: /var/lib/cilium/tls/hubble/server.crt
  hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
  hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
  identity-allocation-mode: crd
  install-iptables-rules: "true"
  install-no-conntrack-iptables-rules: "false"
  ipam: kubernetes
  kube-proxy-replacement: disabled
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  monitor-aggregation-interval: 5s
  node-port-bind-protection: "true"
  operator-api-serve-addr: 127.0.0.1:9234
  operator-prometheus-serve-addr: :6942
  preallocate-bpf-maps: "false"
  prometheus-serve-addr: :9090
  proxy-prometheus-port: "9095"
  sidecar-istio-proxy-image: cilium/istio_proxy
  tunnel: geneve
jaxesn commented 2 years ago

pods logs

level=info msg="Started gops server" address="127.0.0.1:9890" subsys=daemon
level=info msg="Memory available for map entries (0.003% of 8344240128B): 20860600B" subsys=config
level=info msg="option bpf-ct-global-tcp-max set by dynamic sizing to 131072" subsys=config
level=info msg="option bpf-ct-global-any-max set by dynamic sizing to 65536" subsys=config
level=info msg="option bpf-nat-global-max set by dynamic sizing to 131072" subsys=config
level=info msg="option bpf-neigh-global-max set by dynamic sizing to 131072" subsys=config
level=info msg="option bpf-sock-rev-map-max set by dynamic sizing to 65536" subsys=config
level=info msg="  --bpf-map-dynamic-size-ratio='0.0025'" subsys=daemon
level=info msg="  --cmdref=''" subsys=daemon
level=info msg="  --config=''" subsys=daemon
level=info msg="  --config-dir='/tmp/cilium/config-map'" subsys=daemon
level=info msg="  --debug='false'" subsys=daemon
level=info msg="  --debug-verbose=''" subsys=daemon
level=info msg="  --egress-masquerade-interfaces=''" subsys=daemon
level=info msg="  --exclude-local-address=''" subsys=daemon
level=info msg="  --labels=''" subsys=daemon
level=info msg="  --log-opt=''" subsys=daemon
level=info msg="  --mtu='0'" subsys=daemon
level=info msg="  --prometheus-serve-addr=':9090'" subsys=daemon
level=info msg="  --version='false'" subsys=daemon
level=info msg="     _ _ _" subsys=daemon
level=info msg=" ___|_| |_|_ _ _____" subsys=daemon
level=info msg="|  _| | | | | |     |" subsys=daemon
level=info msg="|___|_|_|_|___|_|_|_|" subsys=daemon
level=info msg="Cilium 1.10.8-eksa.1 c233628 2022-02-25T13:31:12-08:00 go version go1.16.14 linux/amd64" subsys=daemon
level=info msg="cilium-envoy  version: 9c0d933166ba192713f9e2fc3901f788557286ee/1.21.1/Distribution/RELEASE/BoringSSL" subsys=daemon
level=info msg="clang (10.0.0) and kernel (5.4.0) versions: OK!" subsys=linux-datapath
level=info msg="linking environment: OK!" subsys=linux-datapath
level=info msg="Detected mounted BPF filesystem at /sys/fs/bpf" subsys=bpf
level=info msg="Mounted cgroupv2 filesystem at /run/cilium/cgroupv2" subsys=cgroups
level=info msg="Parsing base label prefixes from default label list" subsys=labels-filter
level=info msg="Parsing additional label prefixes from user inputs: []" subsys=labels-filter
level=info msg="Final label prefixes to be used for identity evaluation:" subsys=labels-filter
level=info msg=" - reserved:.*" subsys=labels-filter
level=info msg=" - :io\\.kubernetes\\.pod\\.namespace" subsys=labels-filter
level=info msg=" - :io\\.cilium\\.k8s\\.namespace\\.labels" subsys=labels-filter
level=info msg=" - :app\\.kubernetes\\.io" subsys=labels-filter
level=info msg=" - !:io\\.kubernetes" subsys=labels-filter
level=info msg=" - !:kubernetes\\.io" subsys=labels-filter
level=info msg=" - !:.*beta\\.kubernetes\\.io" subsys=labels-filter
level=info msg=" - !:k8s\\.io" subsys=labels-filter
level=info msg=" - !:pod-template-generation" subsys=labels-filter
level=info msg=" - !:pod-template-hash" subsys=labels-filter
level=info msg=" - !:controller-revision-hash" subsys=labels-filter
level=info msg=" - !:annotation.*" subsys=labels-filter
level=info msg=" - !:etcd_node" subsys=labels-filter
level=info msg="Using autogenerated IPv4 allocation range" subsys=node v4Prefix=10.225.0.0/16
level=info msg="Initializing daemon" subsys=daemon
level=info msg="Establishing connection to apiserver" host="https://10.96.0.1:443" subsys=k8s
level=info msg="Connected to apiserver" subsys=k8s
level=info msg="Auto-disabling \"enable-node-port\", \"enable-external-ips\", \"enable-host-reachable-services\", \"enable-host-port\" features and falling back to \"enable-host-legacy-routing\"" subsys=daemon
level=info msg="Inheriting MTU from external network interface" device=eth0 ipAddr=195.17.128.225 mtu=1500 subsys=mtu
level=info msg="Restored services from maps" failed=0 restored=6 subsys=service
level=info msg="L7 proxies are disabled" subsys=daemon
level=info msg="Reading old endpoints..." subsys=daemon
level=info msg="Waiting until all Cilium CRDs are available" subsys=k8s
level=info msg="All Cilium CRDs have been found and are available" subsys=k8s
level=info msg="Retrieved node information from kubernetes node" nodeName=jgw-vsphere-rbwnn subsys=k8s
level=info msg="Received own node information from API server" ipAddr.ipv4=195.17.128.225 ipAddr.ipv6="<nil>" k8sNodeIP=195.17.128.225 labels="map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/instance-type:vsphere-vm.cpu-2.mem-8gb.os-ubuntu beta.kubernetes.io/os:linux kubernetes.io/arch:amd64 kubernetes.io/hostname:jgw-vsphere-rbwnn kubernetes.io/os:linux node-role.kubernetes.io/control-plane: node-role.kubernetes.io/master: node.kubernetes.io/exclude-from-external-load-balancers: node.kubernetes.io/instance-type:vsphere-vm.cpu-2.mem-8gb.os-ubuntu]" nodeName=jgw-vsphere-rbwnn subsys=k8s v4Prefix=192.168.3.0/24 v6Prefix="<nil>"
level=info msg="Restored router IPs from node information" ipv4=192.168.3.216 ipv6="<nil>" subsys=k8s
level=info msg="k8s mode: Allowing localhost to reach local endpoints" subsys=daemon
level=info msg="Enabling k8s event listener" subsys=k8s-watcher
level=info msg="Removing stale endpoint interfaces" subsys=daemon
level=info msg="Waiting until all pre-existing resources have been received" subsys=k8s-watcher
level=info msg="Skipping kvstore configuration" subsys=daemon
level=info msg="Restored router address from node_config" file=/var/run/cilium/state/globals/node_config.h ipv4=192.168.3.216 ipv6="<nil>" subsys=node
level=info msg="Initializing node addressing" subsys=daemon
level=info msg="Initializing kubernetes IPAM" subsys=ipam v4Prefix=192.168.3.0/24 v6Prefix="<nil>"
level=info msg="Restoring endpoints..." subsys=daemon
level=warning msg="Unable to restore endpoint, ignoring" endpointID=1126 error="interface lxc1c5525f58153 could not be found" k8sPodName=kube-system/vsphere-csi-node-7pm7l subsys=daemon
level=info msg="Endpoints restored" failed=1 restored=1 subsys=daemon
level=info msg="Addressing information:" subsys=daemon
level=info msg="  Cluster-Name: default" subsys=daemon
level=info msg="  Cluster-ID: 0" subsys=daemon
level=info msg="  Local node-name: jgw-vsphere-rbwnn" subsys=daemon
level=info msg="  Node-IPv6: <nil>" subsys=daemon
level=info msg="  External-Node IPv4: 195.17.128.225" subsys=daemon
level=info msg="  Internal-Node IPv4: 192.168.3.216" subsys=daemon
level=info msg="  IPv4 allocation prefix: 192.168.3.0/24" subsys=daemon
level=info msg="  Loopback IPv4: 169.254.42.1" subsys=daemon
level=info msg="  Local IPv4 addresses:" subsys=daemon
level=info msg="  - 195.17.128.225" subsys=daemon
level=info msg="  - 192.168.3.216" subsys=daemon
level=info msg="Creating or updating CiliumNode resource" node=jgw-vsphere-rbwnn subsys=nodediscovery
level=info msg="Adding local node to cluster" node="{jgw-vsphere-rbwnn default [{ExternalIP 195.17.128.225} {InternalIP 195.17.128.225} {CiliumInternalIP 192.168.3.216}] 192.168.3.0/24 <nil> <nil> <nil> 0 local 0 map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/instance-type:vsphere-vm.cpu-2.mem-8gb.os-ubuntu beta.kubernetes.io/os:linux kubernetes.io/arch:amd64 kubernetes.io/hostname:jgw-vsphere-rbwnn kubernetes.io/os:linux node-role.kubernetes.io/control-plane: node-role.kubernetes.io/master: node.kubernetes.io/exclude-from-external-load-balancers: node.kubernetes.io/instance-type:vsphere-vm.cpu-2.mem-8gb.os-ubuntu] 1 }" subsys=nodediscovery
level=info msg="All pre-existing resources have been received; continuing" subsys=k8s-watcher
level=info msg="Annotating k8s node" subsys=daemon v4CiliumHostIP.IPv4=192.168.3.216 v4Prefix=192.168.3.0/24 v4healthIP.IPv4="<nil>" v6CiliumHostIP.IPv6="<nil>" v6Prefix="<nil>" v6healthIP.IPv6="<nil>"
level=info msg="Initializing identity allocator" subsys=identity-cache
level=info msg="regenerating all endpoints" reason="one or more identities created or deleted" subsys=endpoint-manager
level=info msg="Setting up BPF datapath" bpfClockSource=ktime bpfInsnSet=v2 subsys=datapath-loader
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.core.bpf_jit_enable sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.all.rp_filter sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=kernel.unprivileged_bpf_disabled sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=kernel.timer_migration sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_host.forwarding sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_host.rp_filter sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_host.accept_local sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_host.send_redirects sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_net.forwarding sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_net.rp_filter sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_net.accept_local sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_net.send_redirects sysParamValue=0
level=info msg="Serving cilium node monitor v1.2 API at unix:///var/run/cilium/monitor1_2.sock" subsys=monitor-agent
level=info msg="Validating configured node address ranges" subsys=daemon
level=info msg="Starting connection tracking garbage collector" subsys=daemon
level=info msg="Starting IP identity watcher" subsys=ipcache
level=info msg="Initial scan of connection tracking completed" subsys=ct-gc
level=info msg="Regenerating restored endpoints" numRestored=1 subsys=daemon
level=info msg="Datapath signal listener running" subsys=signal
level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=4062 identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=info msg="Successfully restored endpoint. Scheduling regeneration" endpointID=4062 subsys=daemon
level=info msg="Removed endpoint" containerID=89a8f28392 datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=1126 identity=17048 ipv4=192.168.3.108 ipv6= k8sPodName=kube-system/vsphere-csi-node-7pm7l subsys=endpoint
level=info msg="Serving prometheus metrics on :9090" subsys=daemon
level=info msg="Started healthz status API server" address="127.0.0.1:9876" subsys=daemon
level=info msg="Initializing Cilium API" subsys=daemon
level=info msg="Daemon initialization completed" bootstrapTime=3.778104623s subsys=daemon
level=info msg="Hubble server is disabled" subsys=hubble
level=info msg="Serving cilium API at unix:///var/run/cilium/cilium.sock" subsys=daemon
level=info msg="Compiled new BPF template" BPFCompilationTime=1.008160071s file-path=/var/run/cilium/state/templates/3bb54f1ab8b3115619d5b3607dab8d0b51b481557bfd60366d630fe9e13cc235/bpf_host.o subsys=datapath-loader
level=info msg="Rewrote endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=4062 identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=info msg="Restored endpoint" endpointID=4062 ipAddr="[ ]" subsys=endpoint
level=info msg="Finished regenerating restored endpoints" regenerated=1 subsys=daemon total=1
level=info msg="Removed stale bpf map" file-path=/sys/fs/bpf/tc/globals/cilium_capture_cache subsys=daemon
level=info msg="Removed stale bpf map" file-path=/sys/fs/bpf/tc/globals/cilium_ktime_cache subsys=daemon
level=info msg="Processing API request with rate limiter" maxWaitDuration=15s name=endpoint-create parallelRequests=4 rateLimiterSkipped=true subsys=rate uuid=f28c3982-e062-4e1e-a808-f5f628b06678
level=info msg="API request released by rate limiter" maxWaitDuration=15s name=endpoint-create parallelRequests=4 rateLimiterSkipped=true subsys=rate uuid=f28c3982-e062-4e1e-a808-f5f628b06678 waitDurationTotal=0s
level=info msg="Create endpoint request" addressing="&{192.168.3.249 e6ced21b-25a5-4e09-86b2-be194a5a2f49  }" containerID=b8d71b77c38189c31c763b907a8da6b480b14e11278bbf9e5cba126eb52f986d datapathConfiguration="<nil>" interface=lxc5880e699e966 k8sPodName=kube-system/vsphere-csi-node-7pm7l labels="[]" subsys=daemon sync-build=true
level=info msg="New endpoint" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3256 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=info msg="Resolving identity labels (blocking)" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3256 identityLabels="k8s:app=vsphere-csi-node,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=kube-system,k8s:role=vsphere-csi" ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=info msg="Reusing existing global key" key="k8s:app=vsphere-csi-node;k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system;k8s:io.cilium.k8s.policy.cluster=default;k8s:io.cilium.k8s.policy.serviceaccount=default;k8s:io.kubernetes.pod.namespace=kube-system;k8s:role=vsphere-csi;" subsys=allocator
level=info msg="Identity of endpoint changed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3256 identity=17048 identityLabels="k8s:app=vsphere-csi-node,k8s:io.cilium.k8s.namespace.labels.kubernetes.io/metadata.name=kube-system,k8s:io.cilium.k8s.policy.cluster=default,k8s:io.cilium.k8s.policy.serviceaccount=default,k8s:io.kubernetes.pod.namespace=kube-system,k8s:role=vsphere-csi" ipv4= ipv6= k8sPodName=/ oldIdentity="no identity" subsys=endpoint
level=info msg="Waiting for endpoint to be generated" containerID= datapathPolicyRevision=0 desiredPolicyRevision=0 endpointID=3256 identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=info msg="Compiled new BPF template" BPFCompilationTime=3.966810889s file-path=/var/run/cilium/state/templates/349a9fc844717dd715abd878fd42c8c0ff1cc6eee4ddc3099a68807a9dd9ab29/bpf_lxc.o subsys=datapath-loader
level=error msg="Command execution failed" cmd="[tc filter replace dev lxc5880e699e966 ingress prio 1 handle 1 bpf da obj 3256_next/bpf_lxc.o sec from-container]" error="exit status 1" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="BTF debug data section '.BTF' rejected: Invalid argument (22)!" subsys=datapath-loader
level=warning msg=" - Length:       21070" subsys=datapath-loader
level=warning msg="Verifier analysis:" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="magic: 0xeb9f" subsys=datapath-loader
level=warning msg="version: 1" subsys=datapath-loader
level=warning msg="flags: 0x0" subsys=datapath-loader
level=warning msg="hdr_len: 24" subsys=datapath-loader
level=warning msg="type_off: 0" subsys=datapath-loader
level=warning msg="type_len: 1968" subsys=datapath-loader
level=warning msg="str_off: 1968" subsys=datapath-loader
level=warning msg="str_len: 19078" subsys=datapath-loader
level=warning msg="btf_total_size: 21070" subsys=datapath-loader
level=warning msg="[1] PTR (anon) type_id=2" subsys=datapath-loader
level=warning msg="[2] STRUCT __sk_buff size=184 vlen=32" subsys=datapath-loader
level=warning msg="\tlen type_id=3 bits_offset=0" subsys=datapath-loader
level=warning msg="\tpkt_type type_id=3 bits_offset=32" subsys=datapath-loader
level=warning msg="\tmark type_id=3 bits_offset=64" subsys=datapath-loader
level=warning msg="\tqueue_mapping type_id=3 bits_offset=96" subsys=datapath-loader
level=warning msg="\tprotocol type_id=3 bits_offset=128" subsys=datapath-loader
level=warning msg="\tvlan_present type_id=3 bits_offset=160" subsys=datapath-loader
level=warning msg="\tvlan_tci type_id=3 bits_offset=192" subsys=datapath-loader
level=warning msg="\tvlan_proto type_id=3 bits_offset=224" subsys=datapath-loader
level=warning msg="\tpriority type_id=3 bits_offset=256" subsys=datapath-loader
level=warning msg="\tingress_ifindex type_id=3 bits_offset=288" subsys=datapath-loader
level=warning msg="\tifindex type_id=3 bits_offset=320" subsys=datapath-loader
level=warning msg="\ttc_index type_id=3 bits_offset=352" subsys=datapath-loader
level=warning msg="\tcb type_id=5 bits_offset=384" subsys=datapath-loader
level=warning msg="\thash type_id=3 bits_offset=544" subsys=datapath-loader
level=warning msg="\ttc_classid type_id=3 bits_offset=576" subsys=datapath-loader
level=warning msg="\tdata type_id=3 bits_offset=608" subsys=datapath-loader
level=warning msg="\tdata_end type_id=3 bits_offset=640" subsys=datapath-loader
level=warning msg="\tnapi_id type_id=3 bits_offset=672" subsys=datapath-loader
level=warning msg="\tfamily type_id=3 bits_offset=704" subsys=datapath-loader
level=warning msg="\tremote_ip4 type_id=3 bits_offset=736" subsys=datapath-loader
level=warning msg="\tlocal_ip4 type_id=3 bits_offset=768" subsys=datapath-loader
level=warning msg="\tremote_ip6 type_id=7 bits_offset=800" subsys=datapath-loader
level=warning msg="\tlocal_ip6 type_id=7 bits_offset=928" subsys=datapath-loader
level=warning msg="\tremote_port type_id=3 bits_offset=1056" subsys=datapath-loader
level=warning msg="\tlocal_port type_id=3 bits_offset=1088" subsys=datapath-loader
level=warning msg="\tdata_meta type_id=3 bits_offset=1120" subsys=datapath-loader
level=warning msg="\t(anon) type_id=8 bits_offset=1152" subsys=datapath-loader
level=warning msg="\ttstamp type_id=10 bits_offset=1216" subsys=datapath-loader
level=warning msg="\twire_len type_id=3 bits_offset=1280" subsys=datapath-loader
level=warning msg="\tgso_segs type_id=3 bits_offset=1312" subsys=datapath-loader
level=warning msg="\t(anon) type_id=12 bits_offset=1344" subsys=datapath-loader
level=warning msg="\tgso_size type_id=3 bits_offset=1408" subsys=datapath-loader
level=warning msg="[3] TYPEDEF __u32 type_id=4" subsys=datapath-loader
level=warning msg="[4] INT unsigned int size=4 bits_offset=0 nr_bits=32 encoding=(none)" subsys=datapath-loader
level=warning msg="[5] ARRAY (anon) type_id=3 index_type_id=6 nr_elems=5" subsys=datapath-loader
level=warning msg="[6] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)" subsys=datapath-loader
level=warning msg="[7] ARRAY (anon) type_id=3 index_type_id=6 nr_elems=4" subsys=datapath-loader
level=warning msg="[8] UNION (anon) size=8 vlen=1" subsys=datapath-loader
level=warning msg="\tflow_keys type_id=9 bits_offset=0" subsys=datapath-loader
level=warning msg="[9] PTR (anon) type_id=68" subsys=datapath-loader
level=warning msg="[10] TYPEDEF __u64 type_id=11" subsys=datapath-loader
level=warning msg="[11] INT long long unsigned int size=8 bits_offset=0 nr_bits=64 encoding=(none)" subsys=datapath-loader
level=warning msg="[12] UNION (anon) size=8 vlen=1" subsys=datapath-loader
level=warning msg="\tsk type_id=13 bits_offset=0" subsys=datapath-loader
level=warning msg="[13] PTR (anon) type_id=69" subsys=datapath-loader
level=warning msg="[14] FUNC_PROTO (anon) return=15 args=(1 ctx)" subsys=datapath-loader
level=warning msg="[15] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED" subsys=datapath-loader
level=warning msg="[16] FUNC __send_drop_notify type_id=14 vlen != 0" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="Log buffer too small to dump verifier log 16777215 bytes (10 tries)!" subsys=datapath-loader
level=warning msg="Error filling program arrays!" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" file-path=3256_next/bpf_lxc.o identity=17048 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=lxc5880e699e966
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 file-path=3256_next_fail identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=3.966810889s bpfLoadProg=18.383182906s bpfWaitForELF=3.967029088s bpfWriteELF="587.783µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ mapSync="31.398µs" policyCalculation="50.806µs" prepareBuild="374.952µs" proxyConfiguration="6.807µs" proxyPolicyCalculation=294ns proxyWaitForAck=0s reason="updated security labels" subsys=endpoint total=22.352727863s waitingForCTClean="462.165µs" waitingForLock=744ns
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=error msg="Command execution failed" cmd="[tc filter replace dev lxc5880e699e966 ingress prio 1 handle 1 bpf da obj 3256_next/bpf_lxc.o sec from-container]" error="exit status 1" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="BTF debug data section '.BTF' rejected: Invalid argument (22)!" subsys=datapath-loader
level=warning msg=" - Length:       21070" subsys=datapath-loader
level=warning msg="Verifier analysis:" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="magic: 0xeb9f" subsys=datapath-loader
level=warning msg="version: 1" subsys=datapath-loader
level=warning msg="flags: 0x0" subsys=datapath-loader
level=warning msg="hdr_len: 24" subsys=datapath-loader
level=warning msg="type_off: 0" subsys=datapath-loader
level=warning msg="type_len: 1968" subsys=datapath-loader
level=warning msg="str_off: 1968" subsys=datapath-loader
level=warning msg="str_len: 19078" subsys=datapath-loader
level=warning msg="btf_total_size: 21070" subsys=datapath-loader
level=warning msg="[1] PTR (anon) type_id=2" subsys=datapath-loader
level=warning msg="[2] STRUCT __sk_buff size=184 vlen=32" subsys=datapath-loader
level=warning msg="\tlen type_id=3 bits_offset=0" subsys=datapath-loader
level=warning msg="\tpkt_type type_id=3 bits_offset=32" subsys=datapath-loader
level=warning msg="\tmark type_id=3 bits_offset=64" subsys=datapath-loader
level=warning msg="\tqueue_mapping type_id=3 bits_offset=96" subsys=datapath-loader
level=warning msg="\tprotocol type_id=3 bits_offset=128" subsys=datapath-loader
level=warning msg="\tvlan_present type_id=3 bits_offset=160" subsys=datapath-loader
level=warning msg="\tvlan_tci type_id=3 bits_offset=192" subsys=datapath-loader
level=warning msg="\tvlan_proto type_id=3 bits_offset=224" subsys=datapath-loader
level=warning msg="\tpriority type_id=3 bits_offset=256" subsys=datapath-loader
level=warning msg="\tingress_ifindex type_id=3 bits_offset=288" subsys=datapath-loader
level=warning msg="\tifindex type_id=3 bits_offset=320" subsys=datapath-loader
level=warning msg="\ttc_index type_id=3 bits_offset=352" subsys=datapath-loader
level=warning msg="\tcb type_id=5 bits_offset=384" subsys=datapath-loader
level=warning msg="\thash type_id=3 bits_offset=544" subsys=datapath-loader
level=warning msg="\ttc_classid type_id=3 bits_offset=576" subsys=datapath-loader
level=warning msg="\tdata type_id=3 bits_offset=608" subsys=datapath-loader
level=warning msg="\tdata_end type_id=3 bits_offset=640" subsys=datapath-loader
level=warning msg="\tnapi_id type_id=3 bits_offset=672" subsys=datapath-loader
level=warning msg="\tfamily type_id=3 bits_offset=704" subsys=datapath-loader
level=warning msg="\tremote_ip4 type_id=3 bits_offset=736" subsys=datapath-loader
level=warning msg="\tlocal_ip4 type_id=3 bits_offset=768" subsys=datapath-loader
level=warning msg="\tremote_ip6 type_id=7 bits_offset=800" subsys=datapath-loader
level=warning msg="\tlocal_ip6 type_id=7 bits_offset=928" subsys=datapath-loader
level=warning msg="\tremote_port type_id=3 bits_offset=1056" subsys=datapath-loader
level=warning msg="\tlocal_port type_id=3 bits_offset=1088" subsys=datapath-loader
level=warning msg="\tdata_meta type_id=3 bits_offset=1120" subsys=datapath-loader
level=warning msg="\t(anon) type_id=8 bits_offset=1152" subsys=datapath-loader
level=warning msg="\ttstamp type_id=10 bits_offset=1216" subsys=datapath-loader
level=warning msg="\twire_len type_id=3 bits_offset=1280" subsys=datapath-loader
level=warning msg="\tgso_segs type_id=3 bits_offset=1312" subsys=datapath-loader
level=warning msg="\t(anon) type_id=12 bits_offset=1344" subsys=datapath-loader
level=warning msg="\tgso_size type_id=3 bits_offset=1408" subsys=datapath-loader
level=warning msg="[3] TYPEDEF __u32 type_id=4" subsys=datapath-loader
level=warning msg="[4] INT unsigned int size=4 bits_offset=0 nr_bits=32 encoding=(none)" subsys=datapath-loader
level=warning msg="[5] ARRAY (anon) type_id=3 index_type_id=6 nr_elems=5" subsys=datapath-loader
level=warning msg="[6] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)" subsys=datapath-loader
level=warning msg="[7] ARRAY (anon) type_id=3 index_type_id=6 nr_elems=4" subsys=datapath-loader
level=warning msg="[8] UNION (anon) size=8 vlen=1" subsys=datapath-loader
level=warning msg="\tflow_keys type_id=9 bits_offset=0" subsys=datapath-loader
level=warning msg="[9] PTR (anon) type_id=68" subsys=datapath-loader
level=warning msg="[10] TYPEDEF __u64 type_id=11" subsys=datapath-loader
level=warning msg="[11] INT long long unsigned int size=8 bits_offset=0 nr_bits=64 encoding=(none)" subsys=datapath-loader
level=warning msg="[12] UNION (anon) size=8 vlen=1" subsys=datapath-loader
level=warning msg="\tsk type_id=13 bits_offset=0" subsys=datapath-loader
level=warning msg="[13] PTR (anon) type_id=69" subsys=datapath-loader
level=warning msg="[14] FUNC_PROTO (anon) return=15 args=(1 ctx)" subsys=datapath-loader
level=warning msg="[15] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED" subsys=datapath-loader
level=warning msg="[16] FUNC __send_drop_notify type_id=14 vlen != 0" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="Log buffer too small to dump verifier log 16777215 bytes (10 tries)!" subsys=datapath-loader
level=warning msg="Error filling program arrays!" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" file-path=3256_next/bpf_lxc.o identity=17048 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=lxc5880e699e966
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 file-path=3256_next_fail identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=18.167772879s bpfWaitForELF="3.933µs" bpfWriteELF="492.696µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ mapSync="8.713µs" policyCalculation="4.103µs" prepareBuild="577.311µs" proxyConfiguration="6.828µs" proxyPolicyCalculation=278ns proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=18.170210635s waitingForCTClean="488.83µs" waitingForLock=799ns
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=error msg="Command execution failed" cmd="[tc filter replace dev lxc5880e699e966 ingress prio 1 handle 1 bpf da obj 3256_next/bpf_lxc.o sec from-container]" error="exit status 1" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="BTF debug data section '.BTF' rejected: Invalid argument (22)!" subsys=datapath-loader
level=warning msg=" - Length:       21070" subsys=datapath-loader
level=warning msg="Verifier analysis:" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="magic: 0xeb9f" subsys=datapath-loader
level=warning msg="version: 1" subsys=datapath-loader
level=warning msg="flags: 0x0" subsys=datapath-loader
level=warning msg="hdr_len: 24" subsys=datapath-loader
level=warning msg="type_off: 0" subsys=datapath-loader
level=warning msg="type_len: 1968" subsys=datapath-loader
level=warning msg="str_off: 1968" subsys=datapath-loader
level=warning msg="str_len: 19078" subsys=datapath-loader
level=warning msg="btf_total_size: 21070" subsys=datapath-loader
level=warning msg="[1] PTR (anon) type_id=2" subsys=datapath-loader
level=warning msg="[2] STRUCT __sk_buff size=184 vlen=32" subsys=datapath-loader
level=warning msg="\tlen type_id=3 bits_offset=0" subsys=datapath-loader
level=warning msg="\tpkt_type type_id=3 bits_offset=32" subsys=datapath-loader
level=warning msg="\tmark type_id=3 bits_offset=64" subsys=datapath-loader
level=warning msg="\tqueue_mapping type_id=3 bits_offset=96" subsys=datapath-loader
level=warning msg="\tprotocol type_id=3 bits_offset=128" subsys=datapath-loader
level=warning msg="\tvlan_present type_id=3 bits_offset=160" subsys=datapath-loader
level=warning msg="\tvlan_tci type_id=3 bits_offset=192" subsys=datapath-loader
level=warning msg="\tvlan_proto type_id=3 bits_offset=224" subsys=datapath-loader
level=warning msg="\tpriority type_id=3 bits_offset=256" subsys=datapath-loader
level=warning msg="\tingress_ifindex type_id=3 bits_offset=288" subsys=datapath-loader
level=warning msg="\tifindex type_id=3 bits_offset=320" subsys=datapath-loader
level=warning msg="\ttc_index type_id=3 bits_offset=352" subsys=datapath-loader
level=warning msg="\tcb type_id=5 bits_offset=384" subsys=datapath-loader
level=warning msg="\thash type_id=3 bits_offset=544" subsys=datapath-loader
level=warning msg="\ttc_classid type_id=3 bits_offset=576" subsys=datapath-loader
level=warning msg="\tdata type_id=3 bits_offset=608" subsys=datapath-loader
level=warning msg="\tdata_end type_id=3 bits_offset=640" subsys=datapath-loader
level=warning msg="\tnapi_id type_id=3 bits_offset=672" subsys=datapath-loader
level=warning msg="\tfamily type_id=3 bits_offset=704" subsys=datapath-loader
level=warning msg="\tremote_ip4 type_id=3 bits_offset=736" subsys=datapath-loader
level=warning msg="\tlocal_ip4 type_id=3 bits_offset=768" subsys=datapath-loader
level=warning msg="\tremote_ip6 type_id=7 bits_offset=800" subsys=datapath-loader
level=warning msg="\tlocal_ip6 type_id=7 bits_offset=928" subsys=datapath-loader
level=warning msg="\tremote_port type_id=3 bits_offset=1056" subsys=datapath-loader
level=warning msg="\tlocal_port type_id=3 bits_offset=1088" subsys=datapath-loader
level=warning msg="\tdata_meta type_id=3 bits_offset=1120" subsys=datapath-loader
level=warning msg="\t(anon) type_id=8 bits_offset=1152" subsys=datapath-loader
level=warning msg="\ttstamp type_id=10 bits_offset=1216" subsys=datapath-loader
level=warning msg="\twire_len type_id=3 bits_offset=1280" subsys=datapath-loader
level=warning msg="\tgso_segs type_id=3 bits_offset=1312" subsys=datapath-loader
level=warning msg="\t(anon) type_id=12 bits_offset=1344" subsys=datapath-loader
level=warning msg="\tgso_size type_id=3 bits_offset=1408" subsys=datapath-loader
level=warning msg="[3] TYPEDEF __u32 type_id=4" subsys=datapath-loader
level=warning msg="[4] INT unsigned int size=4 bits_offset=0 nr_bits=32 encoding=(none)" subsys=datapath-loader
level=warning msg="[5] ARRAY (anon) type_id=3 index_type_id=6 nr_elems=5" subsys=datapath-loader
level=warning msg="[6] INT __ARRAY_SIZE_TYPE__ size=4 bits_offset=0 nr_bits=32 encoding=(none)" subsys=datapath-loader
level=warning msg="[7] ARRAY (anon) type_id=3 index_type_id=6 nr_elems=4" subsys=datapath-loader
level=warning msg="[8] UNION (anon) size=8 vlen=1" subsys=datapath-loader
level=warning msg="\tflow_keys type_id=9 bits_offset=0" subsys=datapath-loader
level=warning msg="[9] PTR (anon) type_id=68" subsys=datapath-loader
level=warning msg="[10] TYPEDEF __u64 type_id=11" subsys=datapath-loader
level=warning msg="[11] INT long long unsigned int size=8 bits_offset=0 nr_bits=64 encoding=(none)" subsys=datapath-loader
level=warning msg="[12] UNION (anon) size=8 vlen=1" subsys=datapath-loader
level=warning msg="\tsk type_id=13 bits_offset=0" subsys=datapath-loader
level=warning msg="[13] PTR (anon) type_id=69" subsys=datapath-loader
level=warning msg="[14] FUNC_PROTO (anon) return=15 args=(1 ctx)" subsys=datapath-loader
level=warning msg="[15] INT int size=4 bits_offset=0 nr_bits=32 encoding=SIGNED" subsys=datapath-loader
level=warning msg="[16] FUNC __send_drop_notify type_id=14 vlen != 0" subsys=datapath-loader
level=warning subsys=datapath-loader
level=warning msg="Log buffer too small to dump verifier log 16777215 bytes (10 tries)!" subsys=datapath-loader
level=warning msg="Error filling program arrays!" subsys=datapath-loader
level=warning msg="Unable to load program" subsys=datapath-loader
level=warning msg="JoinEP: Failed to load program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" file-path=3256_next/bpf_lxc.o identity=17048 ipv4= ipv6= k8sPodName=/ subsys=datapath-loader veth=lxc5880e699e966
level=error msg="Error while rewriting endpoint BPF program" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="generating BPF for endpoint failed, keeping stale directory." containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 file-path=3256_next_fail identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=warning msg="Regeneration of endpoint failed" bpfCompilation=0s bpfLoadProg=18.235103297s bpfWaitForELF="4.198µs" bpfWriteELF="606.844µs" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ mapSync="3.44µs" policyCalculation="5.425µs" prepareBuild="436.086µs" proxyConfiguration="8.898µs" proxyPolicyCalculation=302ns proxyWaitForAck=0s reason="retrying regeneration" subsys=endpoint total=18.237776686s waitingForCTClean="505.436µs" waitingForLock="1.453µs"
level=error msg="endpoint regeneration failed" containerID= datapathPolicyRevision=0 desiredPolicyRevision=1 endpointID=3256 error="Failed to load prog with tc: exit status 1" identity=17048 ipv4= ipv6= k8sPodName=/ subsys=endpoint
jaxesn commented 2 years ago

Installing 1.10.8 from upstream cilium with the upstream chart does not show the same issue. I did this by deleting the cilium created via eks-a, using this method. Then installed:

helm install -f values.yaml cilium cilium/cilium --version 1.10.8 --namespace kube-system using the same values.yaml from above

Going to dig into the configmap some more to see what differences there may be between the one created by the upstream helm chart and the eks-a one.

To restart all the pods to prove a working cilium vs broken, ive been using this from the cilium docs site:

kubectl get pods --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace,NAME:.metadata.name,HOSTNETWORK:.spec.hostNetwork --no-headers=true | grep '<none>' | awk '{print "-n "$1" "$2}' | xargs -L 1 kubectl delete pod --wait=false

jaxesn commented 2 years ago

I ran helm template to see the differences between the upstream and eksa templates. There were only two diffs, I added the following to the values.yaml to get rid of the diffs:

terminationGracePeriodSeconds: 1
egressGateway:
  healthcheckTimeout: null

helm template -f values.yaml cilium oci://public.ecr.aws/isovalent/cilium --version 1.10.8-eksa.1 --namespace kube-system > cilium-eksa.yaml

helm template -f values.yaml cilium cilium/cilium --version 1.10.8 --namespace kube-system > cilium-upstream.yaml

This still results in the same error in the cilium logs as before

jaxesn commented 2 years ago

Just to confirm, I install using the eksa helm chart but the upstream images from quay:

cni:
  chainingMode: portmap
ipam:
  mode: kubernetes
identityAllocationMode: crd
prometheus:
  enabled: true
rollOutCiliumPods: true
tunnel: geneve
prometheus:
  enabled: true
clustermesh:
  config:
    enabled: false
terminationGracePeriodSeconds: 1
egressGateway:
  healthcheckTimeout: null
image:
  repository: quay.io/cilium/cilium
  tag: v1.10.8
operator:
  image:
    repository: quay.io/cilium/operator
    tag: v1.10.8

helm install -f values.yaml cilium oci://public.ecr.aws/isovalent/cilium --version 1.10.8 --namespace kube-system

That combination does not show the same error. Tends to indicate that the eksa container image is the issue?

jaxesn commented 2 years ago

To run cilium connectivity tests

kubectl create ns cilium-test
kubectl apply -n cilium-test -f https://raw.githubusercontent.com/cilium/cilium/v1.10/examples/kubernetes/connectivity-check/connectivity-check.yaml