aws / eks-charts

Amazon EKS Helm chart repository
Apache License 2.0
1.19k stars 958 forks source link

aws-for-fluent-bit : Upgrading chart version leading to Liveness Probe failed #995

Open jatinmehrotra opened 1 year ago

jatinmehrotra commented 1 year ago

Describe the bug

Until now the fluent bit pods were working fine, but the moment I updated my chart from 0.1.19 to 0.1.29 our Fluent Bit pods enter a CrashLoopBackoff state, due to failures in the newly introduced https://github.com/aws/eks-charts/pull/975."

Pod event show the following message

Liveness probe failed: HTTP Probe failed with statuscode: 500

Steps to reproduce

Spin up an IPv4 EKS Cluster, install the aws-for-fluent-bit Chart in version 0.1.29. The pods will enter CrashLoopBackoff.

Expected outcome Liveness probe should be passed with updated chart configuration

Environment

Chart name: aws-for-fluent-bit Chart version: 0.1.29 Kubernetes version: 1.25 Using EKS (yes/no), if so version? Yes, v1.25.12-eks-2d98532

Additional Context:

Note: the pods are runing on ec2 node.

jatinmehrotra commented 1 year ago

I am wondering liveness probe was introduced in this commit https://github.com/aws/eks-charts/pull/975. I am wondering If this is a port issue that probe requests are not Able to reach in to port 2020

( I may be wrong Since I dont know the exact internals of how fluent bit server is establishing the network connectivity for probes )

jbeemster commented 1 year ago

We are seeing the same issue though its somewhat sporadic - when scaling up an EKS cluster about 30-40% of fluentbit pods enter this crashloop and never seem to pass the health checks. The other pods do manage and are fine.