alibaba / kubeskoop

Network monitoring & diagnosis suite for Kubernetes
https://kubeskoop.io
Apache License 2.0
556 stars 66 forks source link

diagnose error: error run collector for pod: cannot collect pod default/test-2 on node k8s-1 #60

Open wymleo opened 1 year ago

wymleo commented 1 year ago

args: skoop -s 10.233.95.45 -d 10.64.7.92 -p 22 --format json --output a.json

pod_ip: 10.233.95.45
node_ip : 10.64.7.92 mode: calico ipip pod_on_node_ip: 10.64.7.92

Lyt99 commented 1 year ago

Hi, could you please more information about this issue by following steps?

  1. Rerun the command using -v=5 to make verbose output, and --preserve-collector-pod to preserve collector pods after diagnosis finished.

    skoop -s 10.233.95.45 -d 10.64.7.92 -p 22 --format json --output a.json -v=5 --preserve-collector-pod
  2. After command finished (failed), copy the entire output to this issue.

  3. Copy the collector.json from collector pod collector-k8s-1 in skoop namespace, which contains netstack info collected from the target node.

    kubectl cp skoop/collector-k8s-1:data/collector.json ./collector.json

    This command should copy the collector.json from container to your local directory.

You should find your pod name test-2 and namespace default in this file. If you do not find them, there may be some issue on the pod collector, and we should fix it.

If you don't mind, you can also upload this file to the issue.

  1. After copy out the json file, you can manually delete the collector pod by
    kubectl delete pod -n skoop collector-k8s-1
wymleo commented 1 year ago

2.outout: I0706 18:30:23.345039 1952248 cluster.go:84] Detected network plugin "calico". I0706 18:30:23.347033 1952248 cluster.go:99] Detected kube-proxy mode "ipvs" I0706 18:30:23.349959 1952248 cluster.go:118] Detected cluster cidr "10.233.64.0/18" I0706 18:30:23.389710 1952248 ip_cache.go:216] Pod kube-system/backup-etcd-28141560-bvs22 skipped, hostNetwork: true, phase: Succeeded, address: 10.64.6.87 I0706 18:30:23.389743 1952248 ip_cache.go:216] Pod kube-system/backup-etcd-28143000-jf7pc skipped, hostNetwork: true, phase: Succeeded, address: 10.64.6.87 I0706 18:30:23.389755 1952248 ip_cache.go:216] Pod kube-system/backup-master-28141560-46j64 skipped, hostNetwork: true, phase: Succeeded, address: 10.64.6.87 I0706 18:30:23.389764 1952248 ip_cache.go:216] Pod kube-system/backup-master-28143000-whpv7 skipped, hostNetwork: true, phase: Succeeded, address: 10.64.6.87 I0706 18:30:23.389770 1952248 ip_cache.go:216] Pod kube-system/calico-kube-controllers-b5c64bc76-xbrqs skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389776 1952248 ip_cache.go:216] Pod kube-system/calico-node-w4wdz skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389806 1952248 ip_cache.go:216] Pod kube-system/csi-nfs-controller-86b459789-m9ngg skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389816 1952248 ip_cache.go:216] Pod kube-system/csi-nfs-node-q547g skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389870 1952248 ip_cache.go:216] Pod kube-system/kube-apiserver-k8s-1 skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389881 1952248 ip_cache.go:216] Pod kube-system/kube-controller-manager-k8s-1 skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389886 1952248 ip_cache.go:216] Pod kube-system/kube-proxy-q5m8p skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389892 1952248 ip_cache.go:216] Pod kube-system/kube-scheduler-k8s-1 skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389910 1952248 ip_cache.go:216] Pod kube-system/metrics-server-7bdbcc8678-9jdkk skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.389931 1952248 ip_cache.go:216] Pod kube-system/nodelocaldns-96xd5 skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.390012 1952248 ip_cache.go:216] Pod kubegien-logging-system/elasticsearch-logging-curator-elasticsearch-curator-281419qfbht skipped, hostNetwork: false, phase: Succeeded, address: 10.233.95.43 I0706 18:30:23.390023 1952248 ip_cache.go:216] Pod kubegien-logging-system/elasticsearch-logging-curator-elasticsearch-curator-281434gd672 skipped, hostNetwork: false, phase: Succeeded, address: 10.233.95.46 I0706 18:30:23.390194 1952248 ip_cache.go:216] Pod kubegien-monitoring-system/node-exporter-5qfhl skipped, hostNetwork: true, phase: Running, address: 10.64.6.87 I0706 18:30:23.408976 1952248 manager.go:311] Creating pod on node k8s-1 with image kubeskoop/kubeskoop:v0.1.0 I0706 18:30:23.423764 1952248 manager.go:456] Waiting pod skoop/collector-k8s-1 running, current status: Pending I0706 18:30:25.427741 1952248 manager.go:456] Waiting pod skoop/collector-k8s-1 running, current status: Pending I0706 18:30:27.426865 1952248 manager.go:456] Waiting pod skoop/collector-k8s-1 running, current status: Pending I0706 18:30:29.427123 1952248 manager.go:456] Waiting pod skoop/collector-k8s-1 running, current status: Pending I0706 18:30:31.427768 1952248 manager.go:456] Waiting pod skoop/collector-k8s-1 running, current status: Pending I0706 18:30:33.427615 1952248 manager.go:456] Waiting pod skoop/collector-k8s-1 running, current status: Running I0706 18:30:33.427653 1952248 manager.go:489] Trying read collector data from pod skoop/collector-k8s-1 F0706 18:30:33.618143 1952248 app.go:46] diagnose error: error run collector for pod: cannot collect pod default/wym-test-2 on node k8s-1

wymleo commented 1 year ago

3.can not find wym-test-2 in collector [Uploading collector.json.txt…]() .json

wymleo commented 1 year ago

sorry, previous info is wrong. node_ip: 10.64.6.87 pod_on_node_ip: 10.64.6.87