bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.85k stars 9.13k forks source link

[bitnami/nats] readiness check on wrong endpoint #24360

Closed skoef closed 5 months ago

skoef commented 6 months ago

Name and Version

bitnami/nats 7.18.0

What architecture are you using?

amd64

What steps will reproduce the bug?

On my setup with 3 nats instances using Jetstream, one of the instances was restarted. Once nats had started, the pod instantly became ready. However, Jetstream hadn't fully synced yet so my clients connected to a non-ready nats instance 1/3rd of the time. I think this is due to the fact that the wrong endpoint of the monitoring port is probed for readiness.

Currently, the / is probed for readiness, which means that as soon as nats has started and exposes the monitoring port (in the chart this is 8222), the pod is considered ready. However, my pod wasn't ready to take traffic because jetstream hadn't fully synced yet. The monitoring service has a "Health probe" on /healthz which showed me something like {"status":"not ok", "error": "JetStream stream '$G > my-stream' is not current"} (I didn't clip the exact output) and gave a 503 - Service unavailable. After a while, when jetstream was synced, the output became {"status":"ok"} and gave a 200 status.

I think we should update the chart to use /healthz as path for the readiness probe instead (or at least make it configurable).

Are you using any custom parameters or values?

No response

What is the expected behavior?

Pod becomes ready once its health check succeeds

What do you see instead?

Pod instantly becomes ready.

Additional information

No response

skoef commented 6 months ago

I only just found out we can in fact already configure the readiness probe with customReadinessProbe, but I still think the default should be probing an endpoint that actually reflects the readiness.

joancafom commented 6 months ago

Hi @skoef

I agree and think that if nats itself provides a better subpath for checking readiness/liveliness we should use it instead of the plain / one. Do you want to contribute with a PR? The team will be more than happy to review it :)

github-actions[bot] commented 5 months ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

skoef commented 5 months ago

I didn't notice the merge conflict, I will fix it and update the PR!

skoef commented 5 months ago

The PR has been updated and rebased.

joancafom commented 5 months ago

Great! Someone from the team will take a look at it and provide feedback 😁