DandyDeveloper / charts

Various helm charts migrated from [helm/stable] due to deprecation
https://dandydeveloper.github.io/charts
Apache License 2.0
157 stars 145 forks source link

[chart/redis-ha][BUG] check_if_redis_is_master backend fails tcp-check on QUIT command #217

Closed agaudreault closed 2 years ago

agaudreault commented 2 years ago

Describe the bug Invalid tcp-check for Sentinel during the check_if_redis_is_master. At the last steps, it send the QUIT commend which is invalid for the sentinel container it connects to based on https://redis.io/docs/manual/sentinel/, then it returns at step 7 of tcp-check (expect string '+OK')".

[WARNING]  (8) : Server check_if_redis_is_master_0/R0 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '172.20.221.218')", check duration: 2001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8) : Server check_if_redis_is_master_0/R1 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '172.20.221.218')", check duration: 2001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8) : Server check_if_redis_is_master_0/R2 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '172.20.221.218')", check duration: 2001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'check_if_redis_is_master_0' has no server available!
[WARNING]  (8) : Server check_if_redis_is_master_1/R0 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '172.20.62.109')", check duration: 2001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8) : Server check_if_redis_is_master_1/R1 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '172.20.62.109')", check duration: 2001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8) : Server check_if_redis_is_master_1/R2 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '172.20.62.109')", check duration: 2001ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'check_if_redis_is_master_1' has no server available!
[WARNING]  (8) : Server check_if_redis_is_master_2/R0 is DOWN, reason: Layer7 timeout, info: " at step 7 of tcp-check (expect string '+OK')", check duration: 2001ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8) : Server check_if_redis_is_master_2/R1 is DOWN, reason: Layer7 timeout, info: " at step 7 of tcp-check (expect string '+OK')", check duration: 2001ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING]  (8) : Server check_if_redis_is_master_2/R2 is DOWN, reason: Layer7 timeout, info: " at step 7 of tcp-check (expect string '+OK')", check duration: 2037ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT]    (8) : backend 'check_if_redis_is_master_2' has no server available!

https://github.com/DandyDeveloper/charts/blob/6f523013a32beafe370e2fd55403067a6a5647b9/charts/redis-ha/templates/_configs.tpl#L511-L512

sum by (container, pod, proxy) (haproxy_backend_active_servers)

{container="haproxy", proxy="bk_redis_master"} | 1
{container="haproxy", proxy="check_if_redis_is_master_0"} | 0
{container="haproxy", proxy="check_if_redis_is_master_1"} | 0
{container="haproxy", proxy="check_if_redis_is_master_2"} | 0
{container="haproxy", proxy="health_check_http_url"} | 0

HaProxy version: 2.4.17 Redis version: 7.0.2

Expected behavior One of the checks should have 3 servers available and bk_redis_master should use the master instance

Additional context We use the chart as part of an ArgoCD deployment