DandyDeveloper / charts

Various helm charts migrated from [helm/stable] due to deprecation
https://dandydeveloper.github.io/charts
Apache License 2.0
156 stars 143 forks source link

Added splitbrain fix that checks if sentinel and redis agree about cl… #149

Closed zerkms closed 3 years ago

zerkms commented 3 years ago

…uster state

What this PR does / why we need it:

Sometimes after a master failover - sentinel does not switch its redis instance to the new master.

What this PR does is - it checks whether sentinel and redis agree about the cluster state, and if not - the configuration is reinitialised and redis is instructed to shutdown (so that it immediately was restarted with the correct config)

Which issue this PR fixes

(optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged)

Special notes for your reviewer:

This is a solution I successfully use on my cluster, but I'm totally open for discussion about possible changes.

Checklist

[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]

DandyDeveloper commented 3 years ago

@zerkms Thank you for this. I'll do some heavy testing locally as best I can.

It is more of a bandaid, but I do think a large portion of our restrictions come from how Redis / Sentinel decide to do things.

So a bandaid may very well be our only option.

Can you increment the chart minor version? I suspect the CI will fail with the current state of the PR.

zerkms commented 3 years ago

And another note: I'm not a professional shell script developer, so most likely my code needs to be polished :-D It works though!

zerkms commented 3 years ago

Yes to both questions for sure. Please allow me couple extra days though, then I can allocate a bit of time to do both.

DandyDeveloper commented 3 years ago

@zerkms Thank you! I'll keep an eye out, I don't see any issues other then the changes.

I'm not fond of a completely new container for this, but we really don't have a choice.

zerkms commented 3 years ago

I'm not fond of a completely new container for this, but we really don't have a choice.

I initially wanted to put it to the liveness probe, but having a fully separated container looks "cleaner" :-)

DandyDeveloper commented 3 years ago

@zerkms Agreed completely.

zerkms commented 3 years ago

Added a parameter and described the detection sidecar algorithm (I think it might be reworded to look more natural by someone who speaks proper English though).

DandyDeveloper commented 3 years ago

@zerkms Can you look at the CI? Seems to be failing.

zerkms commented 3 years ago

Hm cannot understand why it failed. Bash script does not handle signals so termination takes longer. Is that it?

zerkms commented 3 years ago

Okay, I see that it's a shellcheck fails (no wondering, I suck at shell scripting). Let's see if I can find a place where it says exactly what rules I triggered...

DandyDeveloper commented 3 years ago

@zerkms Merging!