cilium / cilium

eBPF-based Networking, Security, and Observability
https://cilium.io
Apache License 2.0
20.02k stars 2.94k forks source link

CI: NAT46X64 (ci-l4lb) times out waiting for lb-node #24728

Closed lmb closed 8 months ago

lmb commented 1 year ago
+[17:34:05] docker exec -t lb-node docker ps
+[17:34:05] sleep 1
+[17:34:06] docker exec -t lb-node docker ps
Error response from daemon: Container 5d9b6a3517e51f632cc819bfb4f86b5b191358d3a25ef337364719590b483226 is not running

This happens somewhat regularly:

More complete list: https://github.com/cilium/cilium/actions/workflows/tests-l4lb.yaml?query=is%3Afailure

lmb commented 1 year ago

There are variations on a theme here. The error I saw was because lb-node was gone all of a sudden. It seems other containers also just vanish:

+[12:43:45] docker exec -t lb-node docker run --name cilium-lb -td -v /sys/fs/bpf:/sys/fs/bpf -v /lib/modules:/lib/modules --privileged=true --network=host quay.io/cilium/cilium-ci:e8296f91aadc403b3513618771c7b1b6f539056b cilium-agent --enable-ipv4=true --enable-ipv6=true --devices=eth0 --datapath-mode=lb-only --bpf-lb-algorithm=maglev --bpf-lb-dsr-dispatch=ipip --bpf-lb-acceleration=native --bpf-lb-mode=snat
ab9e5fc98340dfb6a69ebebf300bd45af04c6b99a31de37281a4423316e5c6a6
+[12:43:46] docker exec -t lb-node docker exec -t cilium-lb cilium status
Get "http:///var/run/cilium/cilium.sock/v1/healthz": dial unix /var/run/cilium/cilium.sock: connect: no such file or directory

Is the agent running?

+[12:43:46] sleep 3
+[12:43:49] docker exec -t lb-node docker exec -t cilium-lb cilium status
Error response from daemon: Container ab9e5fc98340dfb6a69ebebf300bd45af04c6b99a31de37281a4423316e5c6a6 is not running

ab9e5fc98340 is cilium-lb, not lb-node.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] commented 1 year ago

This issue has not seen any activity since it was marked stale. Closing.

pchaigno commented 10 months ago

This is still very much happening:

learnitall commented 9 months ago

Hit here: https://github.com/cilium/cilium/actions/runs/7545131697/job/20540073713 for PR https://github.com/cilium/cilium/pull/29796