Closed pedep closed 5 years ago
Nice catch, @pedep ! Indeed this is a bug, I didn't test too much with emptyDir
.
I think your patch should fix this issue.
I will be happy to review and merge a PR with the fix.
@AMecea Thanks :smile:
I will try my hand at a PR in a moment
I have made a 3-node mysql cluster to have a play around with mysql-operator
When draining the node containing
mysql-0
, it seems to be unable to restore from a sibling/master in the cluster after the pod has been rescheduled on another node. When inspecting, the sidecar errors with this message https://github.com/presslabs/mysql-operator/blob/c26526ee7be6e22d0f2825e7ce33ae71781b87e3/pkg/sidecar/appclone/appclone.go#L73Since i am using
emptyDir
, theclone-mysql
sidecar should download from the current master, or a sibling, but due to theserverId
being100
, it goes straight to the error-message above. https://github.com/presslabs/mysql-operator/blob/c26526ee7be6e22d0f2825e7ce33ae71781b87e3/pkg/sidecar/appclone/appclone.go#L65It seems some kind of recovery option for pod 0 is needed. I would suggest something along the lines of this
I dont think this will result in the pod trying to connect to itself for recovery, due to this check above https://github.com/presslabs/mysql-operator/blob/c26526ee7be6e22d0f2825e7ce33ae71781b87e3/pkg/sidecar/appclone/appclone.go#L52
Easiest way to reproduce this behaviour is to create a new cluster with
volumeSpec.emptyDir: {}
and a few replicas, and delete themy-cluster-mysql-0
pod.