eldadru / ksniff

Kubectl plugin to ease sniffing on kubernetes pods using tcpdump and wireshark
Apache License 2.0
3.22k stars 192 forks source link

'kubectl sniff' command returning 139 exit/error code during execution. RCA required for failed attempt at packet capture so that workaround can be identified. #173

Open Sayantan-Dell opened 1 year ago

Sayantan-Dell commented 1 year ago

In the same environment and same Kubernetes cluster kubectl sniff works for one pod and does not work for another. Evidence below. I am unable to understand the root cause behind exit/error code 139. Can anyone help regarding this please and a possible workaround

Failure Scenario : For first POD

root@node1:/# kubectl krew version OPTION VALUE GitTag v0.4.4 GitCommit 343e657 IndexURI https://github.com/kubernetes-sigs/krew-index.git BasePath /root/.krew IndexPath /root/.krew/index/default InstallPath /root/.krew/store BinPath /root/.krew/bin DetectedPlatform linux/amd64

root@node1:/# kubectl get pods -n Test-upf1 NAME READY STATUS RESTARTS AGE upf-5896cf6b4c-9shws 3/3 Running 0 15d

root@node1:/# kubectl get pods/upf-5896cf6b4c-9shws -o jsonpath='{.spec.containers[*].name}' -n Test-upf1 upfsp upffp upfrsyslog

root@node1:/# kubectl sniff upf-5896cf6b4c-9shws -n Test-upf1 -o /tmp/upf.pcap INFO[0000] using tcpdump path at: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' INFO[0000] no container specified, taking first container we found in pod. INFO[0000] selected container: 'upfsp' INFO[0000] sniffing method: upload static tcpdump INFO[0000] sniffing on pod: 'upf-5896cf6b4c-9shws' [namespace: 'Test-upf1', container: 'upfsp', filter: '', interface: 'any'] INFO[0000] uploading static tcpdump binary from: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' to: '/tmp/static-tcpdump' INFO[0000] uploading file: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' to '/tmp/static-tcpdump' on container: 'upfsp' INFO[0000] executing command: '[/bin/sh -c test -f /tmp/static-tcpdump]' on container: 'upfsp', pod: 'upf-5896cf6b4c-9shws', namespace: 'Test-upf1' INFO[0000] command: '[/bin/sh -c test -f /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :'' INFO[0000] file found: '' INFO[0000] file was already found on remote pod INFO[0000] tcpdump uploaded successfully INFO[0000] output file option specified, storing output in: '/tmp/upf.pcap' INFO[0000] start sniffing on remote container INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - ]' on container: 'upfsp', pod: 'upf-5896cf6b4c-9shws', namespace: 'Test-upf1' INFO[0000] command: '[/tmp/static-tcpdump -i any -U -w - ]' executing successfully exitCode: '139', stdErr :'' INFO[0000] starting sniffer cleanup INFO[0000] sniffer cleanup completed successfully Error: executing sniffer failed, exit code: '139'

=========================================

Success Scenario : For other PODs

root@node1:/# kubectl get pods -n Test-udm1 NAME READY STATUS RESTARTS AGE udm-ee-79c897c869-9pt9r 2/2 Running 0 15d udm-sdm-5d75ff8775-54lsf 2/2 Running 0 15d udm-ueau-67944949f5-rwd82 2/2 Running 0 15d udm-uecm-76fcf7c57-c8cbs 2/2 Running 0 15d root@node1:/# root@node1:/# root@node1:/# kubectl get pods/udm-ueau-67944949f5-rwd82 -o jsonpath='{.spec.containers[*].name}' -n Test-udm1 udm-ueau istio-proxy root@node1:/# root@node1:/# root@node1:/# kubectl sniff udm-ueau-67944949f5-rwd82 -n Test-udm1 -o /tmp/udm.pcap INFO[0000] using tcpdump path at: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' INFO[0000] no container specified, taking first container we found in pod. INFO[0000] selected container: 'udm-ueau' INFO[0000] sniffing method: upload static tcpdump INFO[0000] sniffing on pod: 'udm-ueau-67944949f5-rwd82' [namespace: 'Test-udm1', container: 'udm-ueau', filter: '', interface: 'any'] INFO[0000] uploading static tcpdump binary from: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' to: '/tmp/static-tcpdump' INFO[0000] uploading file: '/root/.krew/store/sniff/v1.6.2/static-tcpdump' to '/tmp/static-tcpdump' on container: 'udm-ueau' INFO[0000] executing command: '[/bin/sh -c test -f /tmp/static-tcpdump]' on container: 'udm-ueau', pod: 'udm-ueau-67944949f5-rwd82', namespace: 'Test-udm1' INFO[0000] command: '[/bin/sh -c test -f /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :'' INFO[0000] file found: '' INFO[0000] file was already found on remote pod INFO[0000] tcpdump uploaded successfully INFO[0000] output file option specified, storing output in: '/tmp/udm.pcap' INFO[0000] start sniffing on remote container INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - ]' on container: 'udm-ueau', pod: 'udm-ueau-67944949f5-rwd82', namespace: 'Test-udm1' ^C root@node1:/# root@node1:/#

imscuevas commented 1 year ago

I am getting the same issue with nginx image, after trying to debug the issue I connected to the container and I ran the same command that kubectl sniff is using

/tmp/static-tcpdump -i any -U -w -
Segmentation fault (core dumped)

After that, I installed tcpdump with apt update && apt install -y tcpdump on the container and it is working

tcpdump -i any -U -w -
tcpdump: data link type LINUX_SLL2
?ò?tcpdump: listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes

I am wondering if the way that the static-tcpdump is compiled is causing this issue.

Sayantan-Dell commented 1 year ago

Okies so if a POD has multiple containers then the issue is happening. Thanks for the information I will try to run this after installing the TCPDUMP on the container.

imscuevas commented 1 year ago

As far as I remember my pod only has 1 container

k get pods
NAME                      READY   STATUS    RESTARTS   AGE
alpine-654bf79686-jbrjt   1/1     Running   0          24h
ksniff-j446x              1/1     Running   0          25h
ksniff-j6fv6              1/1     Running   0          25h
ksniff-vjmd4              1/1     Running   0          25h
nginx-77b4fdf86c-5qcc2    1/1     Running   0          25h
nginx-77b4fdf86c-5tcl8    1/1     Running   0          29h
nginx-77b4fdf86c-jhzdc    1/1     Running   0          25h

Then, when I run ksniff in any of the nginx pods it fails

k sniff nginx-77b4fdf86c-5qcc2 -n default
INFO[0000] using tcpdump path at: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' 
INFO[0000] no container specified, taking first container we found in pod. 
INFO[0000] selected container: 'nginx'                  
INFO[0000] sniffing method: upload static tcpdump       
INFO[0000] sniffing on pod: 'nginx-77b4fdf86c-5qcc2' [namespace: 'default', container: 'nginx', filter: '', interface: 'any'] 
INFO[0000] uploading static tcpdump binary from: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' to: '/tmp/static-tcpdump' 
INFO[0000] uploading file: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' to '/tmp/static-tcpdump' on container: 'nginx' 
INFO[0000] executing command: '[/bin/sh -c test -f /tmp/static-tcpdump]' on container: 'nginx', pod: 'nginx-77b4fdf86c-5qcc2', namespace: 'default' 
INFO[0000] command: '[/bin/sh -c test -f /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :'' 
INFO[0000] file found: ''                               
INFO[0000] file was already found on remote pod         
INFO[0000] tcpdump uploaded successfully                
INFO[0000] spawning wireshark!                          
INFO[0000] start sniffing on remote container           
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - ]' on container: 'nginx', pod: 'nginx-77b4fdf86c-5qcc2', namespace: 'default' 
INFO[0001] command: '[/tmp/static-tcpdump -i any -U -w - ]' executing successfully exitCode: '139', stdErr :'' 
ERRO[0001] failed to start remote sniffing, stopping wireshark  error="executing sniffer failed, exit code: '139'"
INFO[0001] starting sniffer cleanup                     
INFO[0001] sniffer cleanup completed successfully       
Error: signal: killed

This does not happen with the alpine pod, Wireshark opened without any issue.

k sniff alpine-654bf79686-jbrjt -n default
INFO[0000] using tcpdump path at: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' 
INFO[0000] no container specified, taking first container we found in pod. 
INFO[0000] selected container: 'alpine'                 
INFO[0000] sniffing method: upload static tcpdump       
INFO[0000] sniffing on pod: 'alpine-654bf79686-jbrjt' [namespace: 'default', container: 'alpine', filter: '', interface: 'any'] 
INFO[0000] uploading static tcpdump binary from: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' to: '/tmp/static-tcpdump' 
INFO[0000] uploading file: '/Users/scuevas/.krew/store/sniff/v1.6.2/static-tcpdump' to '/tmp/static-tcpdump' on container: 'alpine' 
INFO[0000] executing command: '[/bin/sh -c test -f /tmp/static-tcpdump]' on container: 'alpine', pod: 'alpine-654bf79686-jbrjt', namespace: 'default' 
INFO[0000] command: '[/bin/sh -c test -f /tmp/static-tcpdump]' executing successfully exitCode: '0', stdErr :'' 
INFO[0000] file found: ''                               
INFO[0000] file was already found on remote pod         
INFO[0000] tcpdump uploaded successfully                
INFO[0000] spawning wireshark!                          
INFO[0000] start sniffing on remote container           
INFO[0000] executing command: '[/tmp/static-tcpdump -i any -U -w - ]' on container: 'alpine', pod: 'alpine-654bf79686-jbrjt', namespace: 'default'
imscuevas commented 1 year ago

Just in case you want to reproduce here is the manifest for my test deployments

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  labels:
    app: alpine
  name: alpine
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: alpine
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: alpine
    spec:
      containers:
      - command:
        - sleep
        - infinity
        image: alpine
        imagePullPolicy: Always
        name: alpine
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
---
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  labels:
    app: nginx
  name: nginx
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 3
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: nginx
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
zoopp commented 8 months ago

For anyone looking for a workaround, what worked for me was to rebuild (the latest version of) static tcpdump and use that instead of what's shipped with the plugin:

  1. Start an alpine container: podman run -v $PWD:/out --rm -it alpine (or use docker instead of podman).
  2. Install dependencies: apk add --update alpine-sdk git libpcap libpcap-dev.
  3. Clone ksniff: cd /tmp; git clone https://github.com/eldadru/ksniff; cd ksniff.
  4. Update tcpdump version in the Makefile (e.g. set TCPDUMP_VERSION=4.99.4).
  5. Build it: make static-tcpdump.
  6. Copy the binary to the host system: cp static-tcpdump /out and exit the container.
  7. Overwrite static-tcpdump from the kubectl plugin: cp static-tcpdump ~/.krew/store/sniff/<version>/

If the old version of static-tcpdump is present at /tmp/static-tcpdump in the pod container then you may need to remove it manually.

crezy8 commented 8 months ago

For anyone looking for a workaround, what worked for me was to rebuild (the latest version of) static tcpdump and use that instead of what's shipped with the plugin:

  1. Start an alpine container: podman run -v $PWD:/out --rm -it alpine (or use docker instead of podman).
  2. Install dependencies: apk add --update alpine-sdk git libpcap libpcap-dev.
  3. Clone ksniff: cd /tmp; git clone https://github.com/eldadru/ksniff; cd ksniff.
  4. Update tcpdump version in the Makefile (e.g. set TCPDUMP_VERSION=4.99.4).
  5. Build it: make static-tcpdump.
  6. Copy the binary to the host system: cp static-tcpdump /out and exit the container.
  7. Overwrite static-tcpdump from the kubectl plugin: cp static-tcpdump ~/.krew/store/sniff/<version>/

If the old version of static-tcpdump is present at /tmp/static-tcpdump in the pod container then you may need to remove it manually.

It works, thanks

thesn10 commented 3 months ago

Workaround for Debian based containers:

kubectl exec -i -t your-pod -- bash -c "apt update && apt install tcpdump -y && \
    rm /tmp/static-tcpdump && \
    ln /bin/tcpdump /tmp/static-tcpdump"
vl-kp commented 2 months ago
root@nginx:/# /tmp/static-tcpdump -i any -U -w -
Segmentation fault (core dumped)