cilium / cilium-cli

CLI to install, manage & troubleshoot Kubernetes clusters running Cilium
https://cilium.io
Apache License 2.0
434 stars 208 forks source link

Use ephemeral containers during sysdump if Cilium is stuck in crashloop #1867

Open jrajahalme opened 3 years ago

jrajahalme commented 3 years ago

Currently bugtool info for Cilium agent is missing from sysdump for Cilium agents in crashloop. A lot of helpful information (e.g., open sockets, iptables, etc) could be collected also from nodes where Cilium agent fails to start. Would it be possible to run a job in the node with a bugtool/bpftool image to collect the current node state in cases when cilium pod fails to start?

aanm commented 3 years ago

Currently bugtool info for Cilium agent is missing from sysdump for Cilium agents in crashloop. A lot of helpful information (e.g., open sockets, iptables, etc) could be collected also from nodes where Cilium agent fails to start. Would it be possible to run a job in the node with a bugtool/bpftool image to collect the current node state in cases when cilium pod fails to start?

Yes it is possible if a) the cluster supports ephemeral containers: https://kubernetes.io/docs/concepts/workloads/pods/ephemeral-containers/ or b) run a Deployment in the node(s) that are selected by a specific label, or even all nodes, that runs the bugtool in those nodes.

christarazi commented 1 year ago

^ Good idea. I've updated the issue to reflect this feature request and transferring it to the CLI repo as that's where sysdump lives.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

github-actions[bot] commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.