dragonflydb / dragonfly-operator

A Kubernetes operator to install and manage Dragonfly instances.
https://www.dragonflydb.io/docs/managing-dragonfly/operator/installation
Apache License 2.0
144 stars 34 forks source link

Try harder to failover on recover from master loss #250

Open applike-ss opened 1 month ago

applike-ss commented 1 month ago

Regarding this: https://github.com/dragonflydb/dragonfly-operator/blob/64cfcbae58dc68c600f313b344e6ad19ad332fe6/internal/controller/dragonfly_instance.go#L116

and this: https://github.com/dragonflydb/dragonfly-operator/blob/64cfcbae58dc68c600f313b344e6ad19ad332fe6/internal/controller/dragonfly_instance.go#L117

We just observed this behavior and in the logs i discovered this error: error running SLAVE OF command: dial tcp 10.138.59.180:9999: i/o timeout, so i'll assume that either of this happened:

Due to this i would like to suggest the following changes:

Pothulapati commented 2 weeks ago

Thanks @applike-ss for the issue!

All the suggestions seem valid, and are easy enough to implement.