Try harder to failover on recover from master loss

Regarding this: https://github.com/dragonflydb/dragonfly-operator/blob/64cfcbae58dc68c600f313b344e6ad19ad332fe6/internal/controller/dragonfly_instance.go#L116

and this: https://github.com/dragonflydb/dragonfly-operator/blob/64cfcbae58dc68c600f313b344e6ad19ad332fe6/internal/controller/dragonfly_instance.go#L117

We just observed this behavior and in the logs i discovered this error: error running SLAVE OF command: dial tcp 10.138.59.180:9999: i/o timeout, so i'll assume that either of this happened:

network issue
dragonfly main/networking thread blocked
dragonfly crashed without killing the process

Due to this i would like to suggest the following changes:

Check via redis client that the operator can talk to the new master before promoting it
Check via redis client that the operator can talk to the (now) replicas before setting it to slave of new master
kill the pod if it can't talk to it after X tries (configurable? 0 meaning, do not kill it?)

dragonflydb / dragonfly-operator

Try harder to failover on recover from master loss #250