cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
19.82k stars 2.91k forks source link

cilium fails to start with k3s and ubuntu 22.04 #20331

Closed rlex closed 2 years ago

rlex commented 2 years ago

Is there an existing issue for this?

What happened?

Cilium 1.12.0-rc3 (atm), ubuntu 22.04, k3s v1.23.6+k3s1

As soon as i update k3s to 1.23.7 or 1.24 cilium-agent starts to crashloop.

Here is relevant slack thread just in case: https://cilium.slack.com/archives/C1MATJ5U5/p1655243973200179

I can easily reproduce it by upgrading one of nodes to affected version (anything newer than 1.23.6) - cilium agent fails to start. Downgrading back fixes it instantly.

My values for cilium:

autoDirectNodeRoutes: false
bandwidthManager: true
bpf:
  masquerade: true
containerRuntime:
  integration: containerd
debug:
  enabled: false
devices: null
encryption:
  enabled: false
  type: wireguard
endpointRoutes:
  enabled: true
hostFirewall:
  enabled: true
hostServices:
  enabled: true
  protocols: tcp,udp
hubble:
  listenAddress: :4244
  metrics:
    enabled:
    - dns:query;ignoreAAAA
    - drop
    - tcp
    - flow
    - icmp
    - http
    serviceMonitor:
      enabled: true
  relay:
    enabled: true
  tls:
    auto:
      method: cronJob
  ui:
    enabled: true
    ingress:
      annotations:
        kubernetes.io/ingress.class: nginx
      enabled: true
      hosts:
      - hubble
ingressController:
  enabled: true
ipam:
  mode: cluster-pool
  operator:
    clusterPoolIPv4MaskSize: 24
    clusterPoolIPv4PodCIDRList:
    - 10.251.0.0/16
ipv4NativeRoutingCIDR: 10.121.0.0/24
k8sServiceHost: 127.0.0.1
k8sServicePort: 16443
kubeProxyReplacement: strict
l7Proxy: true
loadBalancer:
  mode: dsr
monitor:
  enabled: false
nodeinit:
  restartPods: true
operator:
  nodeSelector:
    node-role.kubernetes.io/master: "true"
  prometheus:
    enabled: true
    port: 6942
    serviceMonitor:
      enabled: true
  replicas: 2
  rollOutPods: true
pprof:
  enabled: true
prometheus:
  enabled: true
  serviceMonitor:
    enabled: true
rollOutCiliumPods: true
tunnel: geneve

one of Interesting parts of logs are:

level=debug msg="Cannot find socket" error="stat /var/run/cilium/health.sock: no such file or directory" file-path=/var/run/cilium/health.sock subsys=cilium-health-launcher
level=debug msg="Cannot find socket" error="stat /var/run/cilium/health.sock: no such file or directory" file-path=/var/run/cilium/health.sock subsys=cilium-health-launcher

Might be related to https://github.com/cilium/cilium/issues/8595 ? But it's pretty old one.

Cilium Version

cilium-cli: 0.11.10 compiled with go1.18.3 on darwin/arm64 cilium image (default): v1.11.6 cilium image (stable): v1.11.6 cilium image (running): v1.12.0-rc3

Also tried with stable.

Kernel Version

Linux node-1 5.15.0-39-generic #42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Kubernetes Version

Client Version: v1.24.1 Kustomize Version: v4.5.4 Server Version: v1.23.6+k3s1

Sysdump

No response

Relevant log output

level=info msg="Started gops server" address="127.0.0.1:9890" subsys=daemon
level=info msg="Memory available for map entries (0.003% of 16384200704B): 40960501B" subsys=config
level=info msg="option bpf-ct-global-tcp-max set by dynamic sizing to 143721" subsys=config
level=info msg="option bpf-ct-global-any-max set by dynamic sizing to 71860" subsys=config
level=info msg="option bpf-nat-global-max set by dynamic sizing to 143721" subsys=config
level=info msg="option bpf-neigh-global-max set by dynamic sizing to 143721" subsys=config
level=info msg="option bpf-sock-rev-map-max set by dynamic sizing to 71860" subsys=config
level=info msg="  --agent-health-port='9876'" subsys=daemon
level=info msg="  --agent-labels=''" subsys=daemon
level=info msg="  --allocator-list-timeout='3m0s'" subsys=daemon
level=info msg="  --allow-icmp-frag-needed='true'" subsys=daemon
level=info msg="  --allow-localhost='auto'" subsys=daemon
level=info msg="  --annotate-k8s-node='false'" subsys=daemon
level=info msg="  --api-rate-limit=''" subsys=daemon
level=info msg="  --arping-refresh-period='30s'" subsys=daemon
level=info msg="  --auto-create-cilium-node-resource='true'" subsys=daemon
level=info msg="  --auto-direct-node-routes='false'" subsys=daemon
level=info msg="  --bgp-announce-lb-ip='false'" subsys=daemon
level=info msg="  --bgp-announce-pod-cidr='false'" subsys=daemon
level=info msg="  --bgp-config-path='/var/lib/cilium/bgp/config.yaml'" subsys=daemon
level=info msg="  --bpf-ct-global-any-max='262144'" subsys=daemon
level=info msg="  --bpf-ct-global-tcp-max='524288'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-any='1m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp='6h0m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp-fin='10s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-regular-tcp-syn='1m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-service-any='1m0s'" subsys=daemon
level=info msg="  --bpf-ct-timeout-service-tcp='6h0m0s'" subsys=daemon
level=info msg="  --bpf-filter-priority='1'" subsys=daemon
level=info msg="  --bpf-fragments-map-max='8192'" subsys=daemon
level=info msg="  --bpf-lb-acceleration='disabled'" subsys=daemon
level=info msg="  --bpf-lb-affinity-map-max='0'" subsys=daemon
level=info msg="  --bpf-lb-algorithm='random'" subsys=daemon
level=info msg="  --bpf-lb-bypass-fib-lookup='false'" subsys=daemon
level=info msg="  --bpf-lb-dev-ip-addr-inherit=''" subsys=daemon
level=info msg="  --bpf-lb-dsr-dispatch='opt'" subsys=daemon
level=info msg="  --bpf-lb-dsr-l4-xlate='frontend'" subsys=daemon
level=info msg="  --bpf-lb-external-clusterip='false'" subsys=daemon
level=info msg="  --bpf-lb-maglev-hash-seed='JLfvgnHc2kaSUFaI'" subsys=daemon
level=info msg="  --bpf-lb-maglev-map-max='0'" subsys=daemon
level=info msg="  --bpf-lb-maglev-table-size='16381'" subsys=daemon
level=info msg="  --bpf-lb-map-max='65536'" subsys=daemon
level=info msg="  --bpf-lb-mode='dsr'" subsys=daemon
level=info msg="  --bpf-lb-rev-nat-map-max='0'" subsys=daemon
level=info msg="  --bpf-lb-rss-ipv4-src-cidr=''" subsys=daemon
level=info msg="  --bpf-lb-rss-ipv6-src-cidr=''" subsys=daemon
level=info msg="  --bpf-lb-service-backend-map-max='0'" subsys=daemon
level=info msg="  --bpf-lb-service-map-max='0'" subsys=daemon
level=info msg="  --bpf-lb-sock-hostns-only='false'" subsys=daemon
level=info msg="  --bpf-lb-source-range-map-max='0'" subsys=daemon
level=info msg="  --bpf-map-dynamic-size-ratio='0.0025'" subsys=daemon
level=info msg="  --bpf-nat-global-max='524288'" subsys=daemon
level=info msg="  --bpf-neigh-global-max='524288'" subsys=daemon
level=info msg="  --bpf-policy-map-max='16384'" subsys=daemon
level=info msg="  --bpf-root='/sys/fs/bpf'" subsys=daemon
level=info msg="  --bpf-sock-rev-map-max='262144'" subsys=daemon
level=info msg="  --bypass-ip-availability-upon-restore='false'" subsys=daemon
level=info msg="  --certificates-directory='/var/run/cilium/certs'" subsys=daemon
level=info msg="  --cflags=''" subsys=daemon
level=info msg="  --cgroup-root='/run/cilium/cgroupv2'" subsys=daemon
level=info msg="  --cluster-health-port='4240'" subsys=daemon
level=info msg="  --cluster-id=''" subsys=daemon
level=info msg="  --cluster-name='default'" subsys=daemon
level=info msg="  --clustermesh-config='/var/lib/cilium/clustermesh/'" subsys=daemon
level=info msg="  --cmdref=''" subsys=daemon
level=info msg="  --config=''" subsys=daemon
level=info msg="  --config-dir='/tmp/cilium/config-map'" subsys=daemon
level=info msg="  --conntrack-gc-interval='0s'" subsys=daemon
level=info msg="  --crd-wait-timeout='5m0s'" subsys=daemon
level=info msg="  --datapath-mode='veth'" subsys=daemon
level=info msg="  --debug='false'" subsys=daemon
level=info msg="  --debug-verbose=''" subsys=daemon
level=info msg="  --derive-masquerade-ip-addr-from-device=''" subsys=daemon
level=info msg="  --devices=''" subsys=daemon
level=info msg="  --direct-routing-device=''" subsys=daemon
level=info msg="  --disable-cnp-status-updates='true'" subsys=daemon
level=info msg="  --disable-conntrack='false'" subsys=daemon
level=info msg="  --disable-endpoint-crd='false'" subsys=daemon
level=info msg="  --disable-envoy-version-check='false'" subsys=daemon
level=info msg="  --disable-iptables-feeder-rules=''" subsys=daemon
level=info msg="  --dns-max-ips-per-restored-rule='1000'" subsys=daemon
level=info msg="  --dns-policy-unload-on-shutdown='false'" subsys=daemon
level=info msg="  --egress-masquerade-interfaces=''" subsys=daemon
level=info msg="  --egress-multi-home-ip-rule-compat='false'" subsys=daemon
level=info msg="  --enable-auto-protect-node-port-range='true'" subsys=daemon
level=info msg="  --enable-bandwidth-manager='true'" subsys=daemon
level=info msg="  --enable-bbr='false'" subsys=daemon
level=info msg="  --enable-bgp-control-plane='false'" subsys=daemon
level=info msg="  --enable-bpf-clock-probe='true'" subsys=daemon
level=info msg="  --enable-bpf-masquerade='true'" subsys=daemon
level=info msg="  --enable-bpf-tproxy='false'" subsys=daemon
level=info msg="  --enable-cilium-endpoint-slice='false'" subsys=daemon
level=info msg="  --enable-custom-calls='false'" subsys=daemon
level=info msg="  --enable-endpoint-health-checking='true'" subsys=daemon
level=info msg="  --enable-endpoint-routes='true'" subsys=daemon
level=info msg="  --enable-envoy-config='true'" subsys=daemon
level=info msg="  --enable-external-ips='true'" subsys=daemon
level=info msg="  --enable-health-check-nodeport='true'" subsys=daemon
level=info msg="  --enable-health-checking='true'" subsys=daemon
level=info msg="  --enable-host-firewall='true'" subsys=daemon
level=info msg="  --enable-host-legacy-routing='false'" subsys=daemon
level=info msg="  --enable-host-port='true'" subsys=daemon
level=info msg="  --enable-host-reachable-services='true'" subsys=daemon
level=info msg="  --enable-hubble='true'" subsys=daemon
level=info msg="  --enable-hubble-recorder-api='true'" subsys=daemon
level=info msg="  --enable-icmp-rules='false'" subsys=daemon
level=info msg="  --enable-identity-mark='true'" subsys=daemon
level=info msg="  --enable-ip-masq-agent='false'" subsys=daemon
level=info msg="  --enable-ipsec='false'" subsys=daemon
level=info msg="  --enable-ipv4='true'" subsys=daemon
level=info msg="  --enable-ipv4-egress-gateway='false'" subsys=daemon
level=info msg="  --enable-ipv4-fragment-tracking='true'" subsys=daemon
level=info msg="  --enable-ipv4-masquerade='true'" subsys=daemon
level=info msg="  --enable-ipv6='false'" subsys=daemon
level=info msg="  --enable-ipv6-masquerade='true'" subsys=daemon
level=info msg="  --enable-ipv6-ndp='false'" subsys=daemon
level=info msg="  --enable-k8s-api-discovery='false'" subsys=daemon
level=info msg="  --enable-k8s-endpoint-slice='true'" subsys=daemon
level=info msg="  --enable-k8s-event-handover='false'" subsys=daemon
level=info msg="  --enable-k8s-terminating-endpoint='true'" subsys=daemon
level=info msg="  --enable-l2-neigh-discovery='true'" subsys=daemon
level=info msg="  --enable-l7-proxy='true'" subsys=daemon
level=info msg="  --enable-local-node-route='true'" subsys=daemon
level=info msg="  --enable-local-redirect-policy='false'" subsys=daemon
level=info msg="  --enable-mke='false'" subsys=daemon
level=info msg="  --enable-monitor='true'" subsys=daemon
level=info msg="  --enable-node-port='false'" subsys=daemon
level=info msg="  --enable-policy='default'" subsys=daemon
level=info msg="  --enable-recorder='false'" subsys=daemon
level=info msg="  --enable-remote-node-identity='true'" subsys=daemon
level=info msg="  --enable-selective-regeneration='true'" subsys=daemon
level=info msg="  --enable-service-topology='false'" subsys=daemon
level=info msg="  --enable-session-affinity='true'" subsys=daemon
level=info msg="  --enable-svc-source-range-check='true'" subsys=daemon
level=info msg="  --enable-tracing='false'" subsys=daemon
level=info msg="  --enable-unreachable-routes='false'" subsys=daemon
level=info msg="  --enable-vtep='false'" subsys=daemon
level=info msg="  --enable-well-known-identities='false'" subsys=daemon
level=info msg="  --enable-wireguard='false'" subsys=daemon
level=info msg="  --enable-wireguard-userspace-fallback='false'" subsys=daemon
level=info msg="  --enable-xdp-prefilter='false'" subsys=daemon
level=info msg="  --enable-xt-socket-fallback='true'" subsys=daemon
level=info msg="  --encrypt-interface=''" subsys=daemon
level=info msg="  --encrypt-node='false'" subsys=daemon
level=info msg="  --endpoint-gc-interval='5m0s'" subsys=daemon
level=info msg="  --endpoint-interface-name-prefix=''" subsys=daemon
level=info msg="  --endpoint-queue-size='25'" subsys=daemon
level=info msg="  --endpoint-status=''" subsys=daemon
level=info msg="  --envoy-config-timeout='2m0s'" subsys=daemon
level=info msg="  --envoy-log=''" subsys=daemon
level=info msg="  --exclude-local-address=''" subsys=daemon
level=info msg="  --fixed-identity-mapping=''" subsys=daemon
level=info msg="  --force-local-policy-eval-at-source='true'" subsys=daemon
level=info msg="  --fqdn-regex-compile-lru-size='1024'" subsys=daemon
level=info msg="  --gops-port='9890'" subsys=daemon
level=info msg="  --host-reachable-services-protos='tcp,udp'" subsys=daemon
level=info msg="  --http-403-msg=''" subsys=daemon
level=info msg="  --http-idle-timeout='0'" subsys=daemon
level=info msg="  --http-max-grpc-timeout='0'" subsys=daemon
level=info msg="  --http-normalize-path='true'" subsys=daemon
level=info msg="  --http-request-timeout='3600'" subsys=daemon
level=info msg="  --http-retry-count='3'" subsys=daemon
level=info msg="  --http-retry-timeout='0'" subsys=daemon
level=info msg="  --hubble-disable-tls='false'" subsys=daemon
level=info msg="  --hubble-event-buffer-capacity='4095'" subsys=daemon
level=info msg="  --hubble-event-queue-size='0'" subsys=daemon
level=info msg="  --hubble-export-file-compress='false'" subsys=daemon
level=info msg="  --hubble-export-file-max-backups='5'" subsys=daemon
level=info msg="  --hubble-export-file-max-size-mb='10'" subsys=daemon
level=info msg="  --hubble-export-file-path=''" subsys=daemon
level=info msg="  --hubble-listen-address=':4244'" subsys=daemon
level=info msg="  --hubble-metrics='dns:query;ignoreAAAA,drop,tcp,flow,icmp,http'" subsys=daemon
level=info msg="  --hubble-metrics-server=':9091'" subsys=daemon
level=info msg="  --hubble-recorder-sink-queue-size='1024'" subsys=daemon
level=info msg="  --hubble-recorder-storage-path='/var/run/cilium/pcaps'" subsys=daemon
level=info msg="  --hubble-socket-path='/var/run/cilium/hubble.sock'" subsys=daemon
level=info msg="  --hubble-tls-cert-file='/var/lib/cilium/tls/hubble/server.crt'" subsys=daemon
level=info msg="  --hubble-tls-client-ca-files='/var/lib/cilium/tls/hubble/client-ca.crt'" subsys=daemon
level=info msg="  --hubble-tls-key-file='/var/lib/cilium/tls/hubble/server.key'" subsys=daemon
level=info msg="  --identity-allocation-mode='crd'" subsys=daemon
level=info msg="  --identity-change-grace-period='5s'" subsys=daemon
level=info msg="  --identity-restore-grace-period='10m0s'" subsys=daemon
level=info msg="  --install-egress-gateway-routes='false'" subsys=daemon
level=info msg="  --install-iptables-rules='true'" subsys=daemon
level=info msg="  --install-no-conntrack-iptables-rules='false'" subsys=daemon
level=info msg="  --ip-allocation-timeout='2m0s'" subsys=daemon
level=info msg="  --ip-masq-agent-config-path='/etc/config/ip-masq-agent'" subsys=daemon
level=info msg="  --ipam='cluster-pool'" subsys=daemon
level=info msg="  --ipsec-key-file=''" subsys=daemon
level=info msg="  --iptables-lock-timeout='5s'" subsys=daemon
level=info msg="  --iptables-random-fully='false'" subsys=daemon
level=info msg="  --ipv4-native-routing-cidr='10.121.0.0/24'" subsys=daemon
level=info msg="  --ipv4-node='auto'" subsys=daemon
level=info msg="  --ipv4-pod-subnets=''" subsys=daemon
level=info msg="  --ipv4-range='auto'" subsys=daemon
level=info msg="  --ipv4-service-loopback-address='169.254.42.1'" subsys=daemon
level=info msg="  --ipv4-service-range='auto'" subsys=daemon
level=info msg="  --ipv6-cluster-alloc-cidr='f00d::/64'" subsys=daemon
level=info msg="  --ipv6-mcast-device=''" subsys=daemon
level=info msg="  --ipv6-native-routing-cidr=''" subsys=daemon
level=info msg="  --ipv6-node='auto'" subsys=daemon
level=info msg="  --ipv6-pod-subnets=''" subsys=daemon
level=info msg="  --ipv6-range='auto'" subsys=daemon
level=info msg="  --ipv6-service-range='auto'" subsys=daemon
level=info msg="  --ipvlan-master-device='undefined'" subsys=daemon
level=info msg="  --join-cluster='false'" subsys=daemon
level=info msg="  --k8s-api-server=''" subsys=daemon
level=info msg="  --k8s-heartbeat-timeout='30s'" subsys=daemon
level=info msg="  --k8s-kubeconfig-path=''" subsys=daemon
level=info msg="  --k8s-namespace='kube-system'" subsys=daemon
level=info msg="  --k8s-require-ipv4-pod-cidr='false'" subsys=daemon
level=info msg="  --k8s-require-ipv6-pod-cidr='false'" subsys=daemon
level=info msg="  --k8s-service-cache-size='128'" subsys=daemon
level=info msg="  --k8s-service-proxy-name=''" subsys=daemon
level=info msg="  --k8s-sync-timeout='3m0s'" subsys=daemon
level=info msg="  --k8s-watcher-endpoint-selector='metadata.name!=kube-scheduler,metadata.name!=kube-controller-manager,metadata.name!=etcd-operator,metadata.name!=gcp-controller-manager'" subsys=daemon
level=info msg="  --keep-config='false'" subsys=daemon
level=info msg="  --kube-proxy-replacement='strict'" subsys=daemon
level=info msg="  --kube-proxy-replacement-healthz-bind-address=''" subsys=daemon
level=info msg="  --kvstore=''" subsys=daemon
level=info msg="  --kvstore-connectivity-timeout='2m0s'" subsys=daemon
level=info msg="  --kvstore-lease-ttl='15m0s'" subsys=daemon
level=info msg="  --kvstore-max-consecutive-quorum-errors='2'" subsys=daemon
level=info msg="  --kvstore-opt=''" subsys=daemon
level=info msg="  --kvstore-periodic-sync='5m0s'" subsys=daemon
level=info msg="  --label-prefix-file=''" subsys=daemon
level=info msg="  --labels=''" subsys=daemon
level=info msg="  --lib-dir='/var/lib/cilium'" subsys=daemon
level=info msg="  --local-max-addr-scope='252'" subsys=daemon
level=info msg="  --local-router-ipv4=''" subsys=daemon
level=info msg="  --local-router-ipv6=''" subsys=daemon
level=info msg="  --log-driver=''" subsys=daemon
level=info msg="  --log-opt=''" subsys=daemon
level=info msg="  --log-system-load='false'" subsys=daemon
level=info msg="  --max-controller-interval='0'" subsys=daemon
level=info msg="  --metrics=''" subsys=daemon
level=info msg="  --mke-cgroup-mount=''" subsys=daemon
level=info msg="  --monitor-aggregation='medium'" subsys=daemon
level=info msg="  --monitor-aggregation-flags='all'" subsys=daemon
level=info msg="  --monitor-aggregation-interval='5s'" subsys=daemon
level=info msg="  --monitor-queue-size='0'" subsys=daemon
level=info msg="  --mtu='0'" subsys=daemon
level=info msg="  --native-routing-cidr=''" subsys=daemon
level=info msg="  --node-port-acceleration='disabled'" subsys=daemon
level=info msg="  --node-port-algorithm='random'" subsys=daemon
level=info msg="  --node-port-bind-protection='true'" subsys=daemon
level=info msg="  --node-port-mode='snat'" subsys=daemon
level=info msg="  --node-port-range='30000,32767'" subsys=daemon
level=info msg="  --policy-audit-mode='false'" subsys=daemon
level=info msg="  --policy-queue-size='100'" subsys=daemon
level=info msg="  --policy-trigger-interval='1s'" subsys=daemon
level=info msg="  --pprof='true'" subsys=daemon
level=info msg="  --pprof-port='6060'" subsys=daemon
level=info msg="  --preallocate-bpf-maps='false'" subsys=daemon
level=info msg="  --prefilter-device='undefined'" subsys=daemon
level=info msg="  --prefilter-mode='native'" subsys=daemon
level=info msg="  --prepend-iptables-chains='true'" subsys=daemon
level=info msg="  --procfs='/host/proc'" subsys=daemon
level=info msg="  --prometheus-serve-addr=':9090'" subsys=daemon
level=info msg="  --proxy-connect-timeout='1'" subsys=daemon
level=info msg="  --proxy-gid='1337'" subsys=daemon
level=info msg="  --proxy-max-connection-duration-seconds='0'" subsys=daemon
level=info msg="  --proxy-max-requests-per-connection='0'" subsys=daemon
level=info msg="  --proxy-prometheus-port='9095'" subsys=daemon
level=info msg="  --read-cni-conf=''" subsys=daemon
level=info msg="  --restore='true'" subsys=daemon
level=info msg="  --route-metric='0'" subsys=daemon
level=info msg="  --sidecar-istio-proxy-image='cilium/istio_proxy'" subsys=daemon
level=info msg="  --single-cluster-route='false'" subsys=daemon
level=info msg="  --socket-path='/var/run/cilium/cilium.sock'" subsys=daemon
level=info msg="  --sockops-enable='false'" subsys=daemon
level=info msg="  --state-dir='/var/run/cilium'" subsys=daemon
level=info msg="  --tofqdns-dns-reject-response-code='refused'" subsys=daemon
level=info msg="  --tofqdns-enable-dns-compression='true'" subsys=daemon
level=info msg="  --tofqdns-endpoint-max-ip-per-hostname='50'" subsys=daemon
level=info msg="  --tofqdns-idle-connection-grace-period='0s'" subsys=daemon
level=info msg="  --tofqdns-max-deferred-connection-deletes='10000'" subsys=daemon
level=info msg="  --tofqdns-min-ttl='0'" subsys=daemon
level=info msg="  --tofqdns-pre-cache=''" subsys=daemon
level=info msg="  --tofqdns-proxy-port='0'" subsys=daemon
level=info msg="  --tofqdns-proxy-response-max-delay='100ms'" subsys=daemon
level=info msg="  --trace-payloadlen='128'" subsys=daemon
level=info msg="  --tunnel='geneve'" subsys=daemon
level=info msg="  --tunnel-port='0'" subsys=daemon
level=info msg="  --version='false'" subsys=daemon
level=info msg="  --vlan-bpf-bypass=''" subsys=daemon
level=info msg="  --vtep-cidr=''" subsys=daemon
level=info msg="  --vtep-endpoint=''" subsys=daemon
level=info msg="  --vtep-mac=''" subsys=daemon
level=info msg="  --vtep-mask=''" subsys=daemon
level=info msg="  --write-cni-conf-when-ready=''" subsys=daemon
level=info msg="     _ _ _" subsys=daemon
level=info msg=" ___|_| |_|_ _ _____" subsys=daemon
level=info msg="|  _| | | | | |     |" subsys=daemon
level=info msg="|___|_|_|_|___|_|_|_|" subsys=daemon
level=info msg="Cilium 1.12.0-rc2 814ffce 2022-05-04T17:35:03+02:00 go version go1.18.1 linux/amd64" subsys=daemon
level=info msg="cilium-envoy  version: 12e3081cc292764b1308668cab1e7e523429bedc/1.21.1/Distribution/RELEASE/BoringSSL" subsys=daemon
level=info msg="clang (10.0.0) and kernel (5.15.0) versions: OK!" subsys=linux-datapath
level=info msg="linking environment: OK!" subsys=linux-datapath
level=info msg="Detected mounted BPF filesystem at /sys/fs/bpf" subsys=bpf
level=info msg="Mounted cgroupv2 filesystem at /run/cilium/cgroupv2" subsys=cgroups
level=info msg="Parsing base label prefixes from default label list" subsys=labels-filter
level=info msg="Parsing additional label prefixes from user inputs: []" subsys=labels-filter
level=info msg="Final label prefixes to be used for identity evaluation:" subsys=labels-filter
level=info msg=" - reserved:.*" subsys=labels-filter
level=info msg=" - :io\\.kubernetes\\.pod\\.namespace" subsys=labels-filter
level=info msg=" - :io\\.cilium\\.k8s\\.namespace\\.labels" subsys=labels-filter
level=info msg=" - :app\\.kubernetes\\.io" subsys=labels-filter
level=info msg=" - !:io\\.kubernetes" subsys=labels-filter
level=info msg=" - !:kubernetes\\.io" subsys=labels-filter
level=info msg=" - !:.*beta\\.kubernetes\\.io" subsys=labels-filter
level=info msg=" - !:k8s\\.io" subsys=labels-filter
level=info msg=" - !:pod-template-generation" subsys=labels-filter
level=info msg=" - !:pod-template-hash" subsys=labels-filter
level=info msg=" - !:controller-revision-hash" subsys=labels-filter
level=info msg=" - !:annotation.*" subsys=labels-filter
level=info msg=" - !:etcd_node" subsys=labels-filter
level=info msg="Auto-disabling \"enable-bpf-clock-probe\" feature since KERNEL_HZ cannot be determined" error="Cannot probe CONFIG_HZ" subsys=daemon
level=info msg="Using autogenerated IPv4 allocation range" subsys=node v4Prefix=10.64.0.0/16
level=info msg="Initializing daemon" subsys=daemon
level=info msg="Establishing connection to apiserver" host="https://127.0.0.1:16443" subsys=k8s
level=info msg="Connected to apiserver" subsys=k8s
level=info msg="Trying to auto-enable \"enable-node-port\", \"enable-external-ips\", \"enable-host-reachable-services\", \"enable-host-port\", \"enable-session-affinity\" features" subsys=daemon
level=warning msg="Disabling NodePort's \"dsr\" mode feature due to tunneling mode being enabled" subsys=daemon
level=info msg="Inheriting MTU from external network interface" device=eth0 ipAddr=162.55.58.64 mtu=1500 subsys=mtu
level=info msg="Restoring 3 old CIDR identities" subsys=daemon
level=info msg="regenerating all endpoints" reason="one or more identities created or deleted" subsys=endpoint-manager
level=info msg="Envoy: Starting xDS gRPC server listening on /var/run/cilium/xds.sock" subsys=envoy-manager
level=info msg="Restored backends from maps" failedBackends=0 restoredBackends=9 subsys=service
level=info msg="Restored services from maps" failedServices=0 restoredServices=10 subsys=service
level=info msg="Reading old endpoints..." subsys=daemon
level=warning msg="Found incomplete restore directory /var/run/cilium/state/751_next_fail. Removing it..." endpointID=751_next_fail subsys=endpoint
level=warning msg="Found incomplete restore directory /var/run/cilium/state/918_next_fail. Removing it..." endpointID=918_next_fail subsys=endpoint
level=info msg="Reusing previous DNS proxy port: 37209" subsys=daemon
level=info msg="Waiting until all Cilium CRDs are available" subsys=k8s
level=info msg="All Cilium CRDs have been found and are available" subsys=k8s
level=info msg="Creating or updating CiliumNode resource" node=master-3.cluster.example.org subsys=nodediscovery
level=info msg="Retrieved node information from cilium node" nodeName=master-3.cluster.example.org subsys=k8s
level=info msg="Received own node information from API server" ipAddr.ipv4=10.121.0.4 ipAddr.ipv6="<nil>" k8sNodeIP=10.121.0.4 labels="map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/os:linux egress.k3s.io/cluster:true kubernetes.io/arch:amd64 kubernetes.io/hostname:master-3.cluster.example.org kubernetes.io/os:linux node-access:restricted node-role.kubernetes.io/control-plane:true node-role.kubernetes.io/etcd:true node-role.kubernetes.io/master:true]" nodeName=master-3.cluster.example.org subsys=k8s v4Prefix=10.251.3.0/24 v6Prefix="<nil>"
level=info msg="k8s mode: Allowing localhost to reach local endpoints" subsys=daemon
level=info msg="Detected devices" devices="[ens10 eth0]" subsys=daemon
level=info msg="BPF host routing is currently not supported with enable-endpoint-routes. Falling back to legacy host routing (enable-host-legacy-routing=true)." subsys=daemon
level=info msg="Masquerading IP selected for device" device=ens10 ipv4=10.121.0.4 subsys=node
level=info msg="Masquerading IP selected for device" device=eth0 ipv4=162.55.58.64 subsys=node
level=info msg="Enabling k8s event listener" subsys=k8s-watcher
level=info msg="Removing stale endpoint interfaces" subsys=daemon
level=info msg="Waiting until all pre-existing resources have been received" subsys=k8s-watcher
level=info msg="Skipping kvstore configuration" subsys=daemon
level=info msg="Restored router address from node_config" file=/var/run/cilium/state/globals/node_config.h ipv4=10.251.3.241 ipv6="<nil>" subsys=node
level=info msg="Initializing node addressing" subsys=daemon
level=info msg="Initializing cluster-pool IPAM" subsys=ipam v4Prefix=10.251.3.0/24 v6Prefix="<nil>"
level=info msg="Restoring endpoints..." subsys=daemon
level=warning msg="Unable to restore endpoint, ignoring" endpointID=1443 error="interface lxc97b6d26f0602 could not be found" k8sPodName=kube-system/hcloud-csi-driver-node-nqk99 subsys=daemon
level=info msg="Endpoints restored" failed=1 restored=2 subsys=daemon
level=info msg="Addressing information:" subsys=daemon
level=info msg="  Cluster-Name: default" subsys=daemon
level=info msg="  Cluster-ID: 0" subsys=daemon
level=info msg="  Local node-name: master-3.cluster.example.org" subsys=daemon
level=info msg="  Node-IPv6: <nil>" subsys=daemon
level=info msg="  External-Node IPv4: 10.121.0.4" subsys=daemon
level=info msg="  Internal-Node IPv4: 10.251.3.241" subsys=daemon
level=info msg="  IPv4 allocation prefix: 10.251.3.0/24" subsys=daemon
level=info msg="  IPv4 native routing prefix: 10.121.0.0/24" subsys=daemon
level=info msg="  Loopback IPv4: 169.254.42.1" subsys=daemon
level=info msg="  Local IPv4 addresses:" subsys=daemon
level=info msg="  - 162.55.58.64" subsys=daemon
level=info msg="  - 10.121.0.4" subsys=daemon
level=info msg="  - 10.251.3.241" subsys=daemon
level=info msg="Creating or updating CiliumNode resource" node=master-3.cluster.example.org subsys=nodediscovery
level=info msg="Adding local node to cluster" node="{master-3.cluster.example.org default [{InternalIP 10.121.0.4} {CiliumInternalIP 10.251.3.241}] 10.251.3.0/24 [] <nil> [] 10.251.3.109 <nil> 0 local 0 map[beta.kubernetes.io/arch:amd64 beta.kubernetes.io/os:linux egress.k3s.io/cluster:true kubernetes.io/arch:amd64 kubernetes.io/hostname:master-3.cluster.example.org kubernetes.io/os:linux node-access:restricted node-role.kubernetes.io/control-plane:true node-role.kubernetes.io/etcd:true node-role.kubernetes.io/master:true] 1 }" subsys=nodediscovery
level=info msg="All pre-existing resources have been received; continuing" subsys=k8s-watcher
level=info msg="Initializing identity allocator" subsys=identity-cache
level=info msg="Cluster-ID is not specified, skipping ClusterMesh initialization" subsys=daemon
level=info msg="Setting up BPF bandwidth manager" subsys=bandwidth-manager
level=info msg="Setting sysctl" subsys=bandwidth-manager sysParamName=net.core.netdev_max_backlog sysParamValue=1000
level=info msg="Setting sysctl" subsys=bandwidth-manager sysParamName=net.core.somaxconn sysParamValue=4096
level=info msg="Setting sysctl" subsys=bandwidth-manager sysParamName=net.core.default_qdisc sysParamValue=fq
level=info msg="Setting sysctl" subsys=bandwidth-manager sysParamName=net.ipv4.tcp_max_syn_backlog sysParamValue=4096
level=info msg="Setting sysctl" subsys=bandwidth-manager sysParamName=net.ipv4.tcp_congestion_control sysParamValue=cubic
level=info msg="Setting qdisc to fq" device=ens10 subsys=bandwidth-manager
level=info msg="Setting qdisc to fq" device=eth0 subsys=bandwidth-manager
level=info msg="Setting up BPF datapath" bpfClockSource=ktime bpfInsnSet=v3 subsys=datapath-loader
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.core.bpf_jit_enable sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.all.rp_filter sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.fib_multipath_use_neigh sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=kernel.unprivileged_bpf_disabled sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=kernel.timer_migration sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_host.forwarding sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_host.rp_filter sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_host.accept_local sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_host.send_redirects sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_net.forwarding sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_net.rp_filter sysParamValue=0
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_net.accept_local sysParamValue=1
level=info msg="Setting sysctl" subsys=sysctl sysParamName=net.ipv4.conf.cilium_net.send_redirects sysParamValue=0
level=info msg="regenerating all endpoints" reason="one or more identities created or deleted" subsys=endpoint-manager
level=info msg="regenerating all endpoints" reason="one or more identities created or deleted" subsys=endpoint-manager
level=info msg="regenerating all endpoints" reason= subsys=endpoint-manager
level=info msg="Exiting due to signal" signal=terminated subsys=daemon
level=info msg="Waiting for all endpoints' go routines to be stopped." subsys=daemon
level=info msg="All endpoints' goroutines stopped." subsys=daemon
level=error msg="Command execution failed" cmd="[/var/lib/cilium/bpf/init.sh /var/lib/cilium/bpf /var/run/cilium/state /host/proc/sys/net /sys/class/net 10.251.3.241 <nil> tunnel geneve 6081 ens10;eth0 cilium_host cilium_net 1500 true true true /run/cilium/cgroupv2 /sys/fs/bpf true true v3 4 true true 1]" error="signal: killed" subsys=datapath-loader
level=fatal msg="Error while creating daemon" error="error while initializing daemon: failed while reinitializing datapath: Command execution failed for [/var/lib/cilium/bpf/init.sh /var/lib/cilium/bpf /var/run/cilium/state /host/proc/sys/net /sys/class/net 10.251.3.241 <nil> tunnel geneve 6081 ens10;eth0 cilium_host cilium_net 1500 true true true /run/cilium/cgroupv2 /sys/fs/bpf true true v3 4 true true 1]: context canceled" subsys=daemon

Anything else?

No response

Code of Conduct

rlex commented 2 years ago

i also have rp-filter disabled explicitely at boot on all interfaces:

lex@node-1 ⇣⇡ ❯ sysctl net.ipv4.conf.all.rp_filter 
net.ipv4.conf.all.rp_filter = 0
lex@node-1 ⇣⇡ ❯ sysctl net.ipv4.conf.lxc42edfebcb30f.rp_filter 
net.ipv4.conf.lxc42edfebcb30f.rp_filter = 0
lex@node-1 ⇣⇡ ❯ sysctl net.ipv4.conf.eth0.rp_filter 
net.ipv4.conf.eth0.rp_filter = 0
lex@node-1 ⇣⇡ ❯ sysctl net.ipv4.conf.ens10.rp_filter 
net.ipv4.conf.ens10.rp_filter = 0
rlex commented 2 years ago

somewhat similar issue in k3s repo https://github.com/k3s-io/k3s/issues/5188 but i had network policy disabled for quite some time, plus it affects outside connections, not agent as a whole

vincentmli commented 2 years ago

if you could provide sysdump or clear steps to reproduce the issue, it would be helpful, does the problem occur when you have fresh ubuntu 22.04 and fresh k3s install?

rlex commented 2 years ago

was happening on fresh install too. I'll try to create single-node k3s on same provider with same settings.

rlex commented 2 years ago

okay, fresh node with 1.23.7, crashing now. Where should i send sysdump?

vincentmli commented 2 years ago

when you add new comment, at the bottom where you can click and attach files. are you able to reproduce on single node k3s install? what provider your ubuntu VM is provided?

rlex commented 2 years ago

My VM provider is hetzner cloud. Pretty much bare install except typical /etc/hostname, mailer, etc tweaks

k3s config (goes to /etc/rancher/k3s/config.yaml):

#master only stuff
cluster-init: true
disable:
- metrics-server
- traefik
- servicelb
- coredns

flannel-backend: 'none'
cluster-cidr: 10.251.0.0/16
disable-cloud-controller: true
disable-kube-proxy: true
disable-network-policy: true
etcd-expose-metrics: true
kube-controller-manager-arg:
- bind-address=0.0.0.0
kube-proxy-arg:
- metrics-bind-address=0.0.0.0
kube-scheduler-arg:
- bind-address=0.0.0.0
kubelet-arg:
- cloud-provider=external

#generic stuff
node-external-ip: YOUR_EXTERNAL_IP
node-ip: YOUR_INTERNAL_IP

You can probably skip

disable-cloud-controller: true
kubelet-arg:
- cloud-provider=external

Since it's for provisioning via external cloud provider only and probably doesn't mean anything here since cilium have toleration for node.cloudprovider.kubernetes.io/uninitialized: true

No arguments passed to k3s binary, everything is going via config.

Helm values already provided in ticket.

Nothing is deployed except cilium and coredns (which fails to start because of cloudprovider taint)

sysdump attached too.

cilium-sysdump-20220628-235639.zip

rlex commented 2 years ago

also checked that rp_filter applies correctly:

root@master-1:~# sysctl net.ipv4.conf.cilium_geneve.rp_filter
net.ipv4.conf.cilium_geneve.rp_filter = 0
root@master-1:~# sysctl net.ipv4.conf.eth0.rp_filter
net.ipv4.conf.eth0.rp_filter = 0
root@master-1:~# sysctl net.ipv4.conf.enp7s0.rp_filter
net.ipv4.conf.enp7s0.rp_filter = 0
root@master-1:~# sysctl net.ipv4.conf.lxcdd738a58e10a.rp_filter
net.ipv4.conf.lxcdd738a58e10a.rp_filter = 0
vincentmli commented 2 years ago

the sysdump did not complete because cilium agent is not working I think

⚠️ cniconflist-cilium-44zfq: unable to upgrade connection: container not found ("cilium-agent")
⚠️ gops-cilium-44zfq-memstats: failed to list processes "cilium-44zfq" ("cilium-agent") in namespace "kube-system": unable to upgrade connection: container not found ("cilium-agent")

I have sysctl rp_filter setting in 22.04, can you try that before k3s install?

cat /etc/sysctl.d/99-override_cilium_rp_filter.conf

net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.*.rp_filter = 0
rlex commented 2 years ago

rp_filter is disabled via my k3s ansible role, but i applied it manually just in case and nothing happened. It looks like 1.12.0-rc3 already handles this:

root@master-1:~# cat /etc/sysctl.d/99-zzz-override_cilium.conf
# Disable rp_filter on Cilium interfaces since it may cause mangled packets to be dropped
net.ipv4.conf.lxc*.rp_filter = 0
net.ipv4.conf.cilium_*.rp_filter = 0
# The kernel uses max(conf.all, conf.{dev}) as its value, so we need to set .all. to 0 as well.
# Otherwise it will overrule the device specific settings.
net.ipv4.conf.all.rp_filter = 0

Interesting that operator is crashing too.

vincentmli commented 2 years ago

by the way, this is how I install k3s in ubuntu VM 22.04 with two network interfaces, one interface is behind company proxy for internet connection (10.3.72.9), another network interface is for internal network (10.169.72.9), it works fine, I do need to have the rp_filter setting override configured before k3s install, otherwise, it won't work

curl -sfL https://get.k3s.io | INSTALL_K3S_SYMLINK=force INSTALL_K3S_VERSION='v1.24.1+k3s1' INSTALL_K3S_EXEC='--flannel-backend=none --node-ip=10.169.72.9 --node-external-ip=10.3.72.9 --disable=traefik --disable-kube-proxy --disable-network-policy --kube-apiserver-arg=kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname' sh -

cilium install

cilium install --version=v1.12.0-rc3 --kube-proxy-replacement strict --helm-set-string=k8sServiceHost=10.3.72.9,k8sServicePort=6443"
rlex commented 2 years ago

I noticed that i lack

net.ipv4.conf.default.rp_filter = 0

However, all interfaces still had rp_filter disabled.

Just to make sure i added it to 99-sysctl.conf and rebooted node, with no luck.

those are last messages in logs before cilium-agent gets restarted:

level=error msg="Command execution failed" cmd="[/var/lib/cilium/bpf/init.sh /var/lib/cilium/bpf /var/run/cilium/state /host/proc/sys/net /sys/class/net 10.251.0.78 <nil> tunnel geneve 6081 enp7s0;eth0 cilium_host cilium_net 1500 true true true /run/cilium/cgroupv2 /sys/fs/bpf true true v3 3 true true 1]" error="signal: killed" subsys=datapath-loader
level=fatal msg="Error while creating daemon" error="error while initializing daemon: failed while reinitializing datapath: Command execution failed for [/var/lib/cilium/bpf/init.sh /var/lib/cilium/bpf /var/run/cilium/state /host/proc/sys/net /sys/class/net 10.251.0.78 <nil> tunnel geneve 6081 enp7s0;eth0 cilium_host cilium_net 1500 true true true /run/cilium/cgroupv2 /sys/fs/bpf true true v3 3 true true 1]: context canceled" subsys=daemon

And another run:

level=debug msg="Skipping CiliumEndpoint update because it has no k8s pod name" containerID= controller="sync-to-k8s-ciliumendpoint (3605)" datapathPolicyRevision=1 desiredPolicyRevision=1 endpointID=3605 identity=4 ipv4= ipv6= k8sPodName=/ subsys=endpointsynchronizer
level=debug msg="Controller func execution time: 112.803µs" name="sync-to-k8s-ciliumendpoint (3605)" subsys=controller uuid=35c229f3-e3fa-43b2-9821-f9a45d635ed8
level=debug msg="Controller func execution time: 181.832µs" name=metricsmap-bpf-prom-sync subsys=controller uuid=16644322-5264-426b-a5a0-a910880228b2
level=debug msg="Handling request for /healthz" subsys=health-server
level=debug msg="Controller func execution time: 2.705µs" name=bpf-map-sync-cilium_lxc subsys=controller uuid=09615ada-f6f2-4735-9a13-1b851dccef9c
level=debug msg="Controller func execution time: 3.106µs" name=bpf-map-sync-cilium_throttle subsys=controller uuid=0681e6d2-7353-4f88-b38a-d0f96e318edb
level=debug msg="Controller func execution time: 198.484µs" name=metricsmap-bpf-prom-sync subsys=controller uuid=16644322-5264-426b-a5a0-a910880228b2
level=debug msg="Handling request for /healthz" subsys=health-server
level=debug msg="Skip pod event using host networking" k8sNamespace=kube-system k8sPodName=cilium-operator-6694d646b8-6vqh9 new-hostIP=10.31.0.2 new-podIP=10.31.0.2 new-podIPs="[{10.31.0.2}]" old-hostIP=10.31.0.2 old-podIP=10.31.0.2 old-podIPs="[{10.31.0.2}]" subsys=k8s-watcher
level=debug msg="Kubernetes service definition changed" action=service-updated endpoints="10.31.0.2:6942/TCP" k8sNamespace=kube-system k8sSvcName=cilium-operator old-service=nil service="frontends:[]/ports=[metrics]/selector=map[io.cilium/app:operator name:cilium-operator]" subsys=k8s-watcher
level=debug msg="Upserting IP into ipcache layer" identity="{host kube-apiserver false}" ipAddr=95.217.22.103 key=0 subsys=ipcache
level=debug msg="Daemon notified of IP-Identity cache state change" identity="{host kube-apiserver false}" ipAddr="{95.217.22.103 ffffffff}" modification=Upsert subsys=datapath-ipcache
level=debug msg="Upserting IP into ipcache layer" identity="{host local false}" ipAddr=10.31.0.2 key=0 subsys=ipcache
level=debug msg="Daemon notified of IP-Identity cache state change" identity="{host local false}" ipAddr="{10.31.0.2 ffffffff}" modification=Upsert subsys=datapath-ipcache
level=debug msg="Upserting IP into ipcache layer" identity="{host local false}" ipAddr=10.251.0.119 key=0 subsys=ipcache
level=debug msg="Daemon notified of IP-Identity cache state change" identity="{host local false}" ipAddr="{10.251.0.119 ffffffff}" modification=Upsert subsys=datapath-ipcache
level=debug msg="Upserting IP into ipcache layer" identity="{world local false}" ipAddr=0.0.0.0/0 key=0 subsys=ipcache
level=debug msg="Daemon notified of IP-Identity cache state change" identity="{world local false}" ipAddr="{0.0.0.0 00000000}" modification=Upsert subsys=datapath-ipcache
level=debug msg="Controller func execution time: 1.352153ms" name=sync-endpoints-and-host-ips subsys=controller uuid=9a459bd7-d181-43a0-b398-3803a12e64fd
level=debug msg="Skipping CiliumEndpoint update because it has no k8s pod name" containerID= controller="sync-to-k8s-ciliumendpoint (2983)" datapathPolicyRevision=1 desiredPolicyRevision=1 endpointID=2983 identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpointsynchronizer
level=debug msg="Controller func execution time: 87.995µs" name="sync-to-k8s-ciliumendpoint (2983)" subsys=controller uuid=2bb957f8-4912-4733-addf-959638d8675e
level=debug msg="Skipping CiliumEndpoint update because it has not changed" containerID=d23d499354 controller="sync-to-k8s-ciliumendpoint (1825)" datapathPolicyRevision=1 desiredPolicyRevision=1 endpointID=1825 identity=40815 ipv4=10.251.0.233 ipv6= k8sPodName=kube-system/coredns-655b9bc459-fpmp9 subsys=endpointsynchronizer
level=debug msg="Controller func execution time: 117.291µs" name="sync-to-k8s-ciliumendpoint (1825)" subsys=controller uuid=d1f92af6-964d-4c22-a169-7b477de0280b
level=debug msg="Controller func execution time: 1.813µs" name=bpf-map-sync-cilium_lxc subsys=controller uuid=09615ada-f6f2-4735-9a13-1b851dccef9c
level=debug msg="Controller func execution time: 388.791µs" name=link-cache subsys=controller uuid=83be8b7c-d5f4-4882-970e-eea8d973114a
level=debug msg="Controller func execution time: 2.445µs" name=bpf-map-sync-cilium_throttle subsys=controller uuid=0681e6d2-7353-4f88-b38a-d0f96e318edb
level=debug msg="Controller func execution time: 1.341742ms" name=cilium-health-ep subsys=controller uuid=3fe75ff7-ba22-4f8a-b7d8-3c04a8ec3dfb
level=debug msg="Skipping CiliumEndpoint update because it has no k8s pod name" containerID= controller="sync-to-k8s-ciliumendpoint (3605)" datapathPolicyRevision=1 desiredPolicyRevision=1 endpointID=3605 identity=4 ipv4= ipv6= k8sPodName=/ subsys=endpointsynchronizer
level=debug msg="Controller func execution time: 174.97µs" name="sync-to-k8s-ciliumendpoint (3605)" subsys=controller uuid=35c229f3-e3fa-43b2-9821-f9a45d635ed8
level=debug msg="Controller func execution time: 270.538µs" name=metricsmap-bpf-prom-sync subsys=controller uuid=16644322-5264-426b-a5a0-a910880228b2
level=info msg="Exiting due to signal" signal=terminated subsys=daemon
level=debug msg="canceling context in signal handler" subsys=daemon
level=info msg="Shutting down... " subsys=health-server
level=info msg="HTTP server Shutdown: context deadline exceeded" subsys=health-server
level=debug msg="Killing old health endpoint process" pidfile=/var/run/cilium/state/health-endpoint.pid subsys=cilium-health-launcher
level=info msg="Stopped serving cilium health API at unix:///var/run/cilium/health.sock" subsys=health-server
level=debug msg="Killed endpoint process" pid=522 pidfile=/var/run/cilium/state/health-endpoint.pid subsys=cilium-health-launcher
level=info msg="Shutting down... " subsys=daemon
level=debug msg="Didn't find existing device" error="Link not found" subsys=cilium-health-launcher veth=cilium_health
level=info msg="HTTP server Shutdown: context deadline exceeded" subsys=daemon
level=info msg="Stopped serving cilium API at unix:///var/run/cilium/cilium.sock" subsys=daemon
level=debug msg="exiting retrying regeneration goroutine due to endpoint being deleted" containerID= datapathPolicyRevision=1 desiredPolicyRevision=1 endpointID=2983 identity=1 ipv4= ipv6= k8sPodName=/ subsys=endpoint
level=debug msg="Controller func execution time: 1m1.51729468s" name=endpoint-2983-regeneration-recovery subsys=controller uuid=f35246b8-5ddd-4c46-b1ca-dd804fcc6987
level=debug msg="exiting retrying regeneration goroutine due to endpoint being deleted" containerID=d23d499354 datapathPolicyRevision=1 desiredPolicyRevision=1 endpointID=1825 identity=40815 ipv4=10.251.0.233 ipv6= k8sPodName=kube-system/coredns-655b9bc459-fpmp9 subsys=endpoint
level=debug msg="Controller run succeeded; waiting for next controller update or stop" name=endpoint-2983-regeneration-recovery subsys=controller uuid=f35246b8-5ddd-4c46-b1ca-dd804fcc6987
level=debug msg="Controller func execution time: 1m1.516447469s" name=endpoint-1825-regeneration-recovery subsys=controller uuid=a292e931-75be-43fb-888e-d69372ea10b9
level=debug msg="Controller run succeeded; waiting for next controller update or stop" name=endpoint-1825-regeneration-recovery subsys=controller uuid=a292e931-75be-43fb-888e-d69372ea10b9
level=debug msg="Process exited" cmd="ip [netns exec cilium-health cilium-health-responder --listen 4240 --pidfile /var/run/cilium/state/health-endpoint.pid]" exitCode="signal: killed" subsys=launcher
level=info msg="Waiting for all endpoints' go routines to be stopped." subsys=daemon
level=debug msg="stopping EventQueue" name=endpoint-2983 subsys=eventqueue
level=debug msg="stopping EventQueue" name=endpoint-1825 subsys=eventqueue
level=debug msg="stopping EventQueue" name=endpoint-3605 subsys=eventqueue
level=info msg="All endpoints' goroutines stopped." subsys=daemon
rlex commented 2 years ago

Can some of custom parameters cause crashloop? bpf masquerade? DSR? Geneve tunneling?

vincentmli commented 2 years ago

could be, I did notice you have quite a few custom settings, can you install k3s with curl and cilium with cilium-cli the way I installed for just testing to see if it crashes?

rlex commented 2 years ago

Yep, went green. No crashes.

github-actions[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] commented 2 years ago

This issue has not seen any activity since it was marked stale. Closing.