apache / pulsar-helm-chart

Official Apache Pulsar Helm Chart
https://pulsar.apache.org/
Apache License 2.0
212 stars 224 forks source link

pulsar-broker failed to start with both Liveness probe and Readiness probes failed: HTTP probe failed with statuscode: 404 #501

Closed jing-c-tyagi closed 5 months ago

jing-c-tyagi commented 5 months ago

we are running pulsar chart version pulsar-3.3.1 /app version 3.0.3 in aws eks. $ helm list -n pulsar NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION pulsar pulsar 8 2024-05-24 13:10:17.2221715 -0400 EDT deployed pulsar-3.3.1 3.0.3

due to some application side error, we are trying to restart pulsar pods. we ran kubectl scale statefulsets for pulsar-broker but it failed to start with CrashLoopBackOff status and describe the pod shows Liveness probe failed: HTTP probe failed with statuscode: 404 and Readiness probe failed: HTTP probe failed with statuscode: 404

To Reproduce kubectl scale statefulsets pulsar-broker --replicas=0 -n pulsar then kubectl scale statefulsets pulsar-broker --replicas=3 -n pulsar

But the pulsar-broker pods failed to start, $ kgpo -n pulsar

Describe the pulsar-broker-2 pod has the following events: Events: Type Reason Age From Message


Normal Scheduled 29m default-scheduler Successfully assigned pulsar/pulsar-broker-2 to ip-10-249-18-54.ec2.internal Normal Pulled 29m kubelet Container image "apachepulsar/pulsar-all:3.0.3" already present on machine Normal Created 29m kubelet Created container wait-zookeeper-ready Normal Started 29m kubelet Started container wait-zookeeper-ready Normal Pulled 29m kubelet Container image "apachepulsar/pulsar-all:3.0.3" already present on machine Normal Created 29m kubelet Created container wait-bookkeeper-ready Normal Started 29m kubelet Started container wait-bookkeeper-ready Normal Pulled 29m (x2 over 29m) kubelet Container image "apachepulsar/pulsar-all:3.0.3" already present on machine Normal Created 29m (x2 over 29m) kubelet Created container pulsar-broker Normal Started 29m (x2 over 29m) kubelet Started container pulsar-broker Warning Unhealthy 28m (x2 over 28m) kubelet Liveness probe failed: Get "http://100.64.15.103:8080/status.html": dial tcp 100.64.15.103:8080: connect: connection refused Warning Unhealthy 28m (x2 over 28m) kubelet Readiness probe failed: Get "http://100.64.15.103:8080/status.html": dial tcp 100.64.15.103:8080: connect: connection refused Warning Unhealthy 28m (x4 over 28m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 404 Warning Unhealthy 24m (x18 over 28m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 404 Warning BackOff 4m39s (x78 over 27m) kubelet Back-off restarting failed container pulsar-broker in pod pulsar-broker-2_pulsar(a41a0f02-17bd-48ca-9e8f-a93afc673fc4)

$ k logs -n pulsar pulsar-broker-2 --tail=100 -p

jing-c-tyagi commented 5 months ago

I tried to get into a shell into the pod while it has a short running status, and got the following ver-nextrelease-pulsar-broker.txt

lhotari commented 5 months ago

I believe that this is fixed by #489 which is going to be released with 3.4.1 version of the chart. The release is currently in voting: https://lists.apache.org/thread/4stgyrtsnj9jhrvw0b1t8bqr4knj8bfx

lhotari commented 5 months ago

@jing-c-tyagi Pulsar Helm Chart 3.4.1 has been released. This problem should be fixed with #489 which is part of 3.4.1 release. I'll close the issue. Please reopen if the problem persists.

lhotari commented 5 months ago

Actually, now I noticed that the error message contains "Error e reading ledger - ledger=74597 - operation=Failed to read entry - entry=0" . That's a sign of a corrupted or missing ledger. You will need to repair that somehow. One solution is to accept the data loss and allow Pulsar to continue while ignoring the problem by setting autoSkipNonRecoverableData=true. That's a dangerous setting and you will face data loss. However, in this case, since the error is about persistent://public/functions/assignments topic, that is used for Pulsar Functions coordination and the data loss isn't a problem. (You could disable Pulsar Functions completely unless you are using the feature.)

lhotari commented 5 months ago

Closing this since this isn't a pulsar-helm-chart issue.