elastic / e2e-testing

Formal verification of Elastic-Agent and more using BDD
Other
24 stars 42 forks source link

Flaky Test [Initializing / End-To-End Tests / helm_debian_amd64_metricbeat / The Metricbeat chart will create recommended K8S resources – Metricbeat] #2584

Closed elasticmachine closed 2 years ago

elasticmachine commented 2 years ago

Flaky Test

Error details

Step the "metricbeat" Elastic"s helm chart is installed
jlind23 commented 2 years ago

@mdelapenya the error here is:

[2022-05-31T13:58:21.060Z] ERRO[2022-05-31T13:58:20Z] Error executing command args="[delete ]" baseDir=. command=helm env="map[]" error="exit status 1" stderr="Error: Kubernetes cluster unreachable\n"

How can I check further? https://beats-ci.elastic.co/blue/organizations/jenkins/e2e-tests%2Fe2e-testing-mbp%2F7.17/detail/7.17/712/pipeline#:~:text=%5B2022%2D05%2D31T13%3A58%3A21.060Z%5D%20ERRO%5B2022%2D05%2D31T13%3A58%3A20Z%5D%20Error%20executing%20command%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20args%3D%22%5Bdelete%20%5D%22%20baseDir%3D.%20command%3Dhelm%20env%3D%22map%5B%5D%22%20error%3D%22exit%20status%201%22%20stderr%3D%22Error%3A%20Kubernetes%20cluster%20unreachable%5Cn%22

cmacknz commented 2 years ago

I see this:

[control-plane:6443/healthz?timeout=10s](https://helm-charts-test-suite-control-plane:6443/healthz?timeout=10s)  in 0 milliseconds\n[kubelet-check] It seems like the kubelet isn't running or healthy.\n[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.\n\n\tUnfortunately, an error has occurred:\n\t\ttimed out waiting for the condition\n\n\tThis error is likely caused by:\n\t\t- The kubelet is not running\n\t\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\n\tIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t\t- 'systemctl status kubelet'\n\t\t- 'journalctl -xeu kubelet'\n\n\tAdditionally, a control plane component may have crashed or exited when started by the container runtime.\n\tTo troubleshoot, list all containers using your preferred container runtimes CLI.\n\n\tHere is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:\n\t\t- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'\n\t\tOnce you have found the failing container, you can inspect its logs with:\n\t\t- 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'\n\ncouldn't initialize a Kubernetes cluster\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init.runWaitControlPlanePhase\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/init/waitcontrolplane.go:114\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:234\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207\nk8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:147\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864\nk8s.io/kubernetes/cmd/kubeadm/app.Run\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50\nmain.main\n\t_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\nerror execution phase wait-control-plane\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:235\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:422\nk8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:207\nk8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdInit.func1\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/init.go:147\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:826\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914\nk8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864\nk8s.io/kubernetes/cmd/kubeadm/app.Run\n\t/go/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50\nmain.main\n\t_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:203\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1357\n"

Specifically It seems like the kubelet isn't running or healthy.\n[kubelet-check], k8s itself is unhealthy here. Possibly the node/VM/worker this test is running on is underpowered for the test, or there is some configuration or deployment problem hidden here.

Or even more fundamentally, the kubelet never started.

mdelapenya commented 2 years ago

I'd close this issue, as there has been 23 days of green builds since that failure