don't sync on split brain

crispyduck00 commented 2 years ago

Hi,

maybe I am using zrep wrong, or I fond a bigger problem.

I am runnig "zrep -S -f all" on both hosts as systemd timer, doing the takeover with a LXC pre start hook script of a Proxmox cluster VM.

Works great on normal migration with simple takeover. But if the src (sto1) is not available anymore the hook script is performing a takeover -L so my dest (sto2) becomes master.

Problem is now when sto1 comes available/online again. Now "zrep -S -f all" will do a send/receive even if both fs are master=yes.

No idea if this is a good way adding a check, but for now I am testing with adding following lines (copied from expire) after line 1976:

###ensure remote side is not master
    sanity=`zrep_ssh $desthost $ZFSGETLVAL ${ZREPTAG}:master $destfs`
    if [[ $? -ne 0 ]] ; then
        zrep_errquit "Could not get $desthost $destfs ${ZREPTAG}:master value"
    fi
    # Normally, dont quit on error. But this is super-bad.
    if [[ "$sanity" == "yes" ]] ; then
        zrep_errquit "Remote side also marked as master. to clear run on $desthost: zfs inherit ${ZREPTAG}:master $destfs"
    fi
###

With this it is doing nothing when both sides are master, which is okay in my case as I just use it for replication. If someone still wants to have local snapshots, it should maybe done on another place. And also for refresh, but I am no expert in this. ;-)

Br, cd

ppbrown commented 2 years ago

going both from my memory, and from looking at the code... zrep only does a forced takeover if you tell it to. by default, it tries to do a "nice" takeover, and stops if the remote side is unreachable.

So, seems like you left out the part where you explicitly tell it to force local takeover. You add -L yourself, yes?

Given that, it's on you to deal with avoiding splitbrain when you do that. The whole point of the default behaviour is to avoid splitbrain.

if you are doing a wrapper around zrep on takeover to check for other side and force -L... then it is also your responsibility to do a wrapper around the other side's call to zrep, and do what you think is appropriate if the other side is down. Any change to zrep to accommodate your non-default usage, would impede everyone else's default usage.

On Wed, Mar 9, 2022 at 4:12 AM crispyduck00 @.***> wrote:

Hi,

maybe I am using zrep wrong, or I fond a bigger problem.

I am runnig "zrep -S -f all" on both hosts as systemd timer, doing the takeover with a LXC pre start hook script of a Proxmox cluster VM.

Works great on normal migration with simple takeover. But if the src (sto1) is not available anymore the hook script is performing a takeover -L so my dest (sto2) becomes master.

Problem is now when sto1 comes available/online again. Now "zrep -S -f all" will do a send/receive even if both fs are master=yes.

No idea if this is a good way adding a check, but for now I am testing with adding following lines (copied from expire) after line 1976:

ensure remote side is not master sanity=zrep_ssh $desthost $ZFSGETLVAL

${ZREPTAG}:master $destfs` if [[ $? -ne 0 ]] ; then zrep_errquit "Could not get $desthost $destfs ${ZREPTAG}:master value" fi

Normally, dont quit on error. But this is super-bad.

if [[ "$sanity" == "yes" ]] ; then zrep_errquit "Remote side also marked as master. to clear run on $desthost: zfs inherit ${ZREPTAG}:master $destfs" fi

` With this it is doing nothing when both sides are master, which is okay in my case as I just use it for replication. If someone still wants to have local snapshots, it should maybe done on another place. And also for refresh, but I am no expert in this. ;-)

Br, cd

— Reply to this email directly, view it on GitHub https://github.com/bolthole/zrep/issues/181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANEV6KVSJYXH7BLHV2J3W3U7CIUPANCNFSM5QJLILUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

crispyduck00 commented 2 years ago

Hi,

yes, I do a takeover -L on the dest host (host2) when the src host (host1) is not available for what ever reason.

This is fine and works as expected. But now for what ever reason host1 becomes available again and maybe because of this missing wrapper, a mistake,... or what ever both hosts see each other again and host1 is again doing a zrep -S -f all. In this case host1 will e.g. delete fs that were created on host2 while it was offline.

Sure, this can be done with a wrapper but I think there should be a built in mechanism to never ever do something from master to master. You are also doing this on expire, so would be good to have also a check before doing a recv -F, which would maybe delete datasets.

As said, just my opinion.

Br

crispyduck00 commented 2 years ago

Do you think it would be a problem doing a check as mentioned in the zrep_sync() function?

Tested it several times with it, and so it will never do anything while both are have the master:yes.

And when the host1 really comes back, maybe it was just a power outage,... I can simply do the "zfs inherit zrep:master" on it and do a failover from host2 back to host1 as both still have ident snapshots.

Beside this it is already a check if ssh to the dest host is working.

ppbrown commented 2 years ago

I think there should be a built in mechanism to never ever do something from master to master.

that is a fair point. I'll look into it.

ppbrown commented 2 years ago

Please test out latest commit and see if it fits your needs.

crispyduck00 commented 2 years ago

Great! Do you have a donation page; I will buy you a coffee for this fast response! ;-)

Anything against changing it to:

    typeset remotemaster=`zrep_ssh $desthost $ZFSGETLVAL ${ZREPTAG}:master $destfs`
    if [[ "$remotemaster" == "yes" ]] ; then
        zrep_errquit "Other side ($desthost:$destfs) is also master. Split brain detected"
    elif [[ $? -ne 0 ]] ; then
        zrep_errquit "ssh to $desthost not possible. Check connection!"
    fi

I think this would at the same time check if the ssh connection is possible and skip the rest if not.

ppbrown commented 2 years ago

Good suggestion. just pushed an update. As far as donations, please either get me something on my amazon wishlist, or send me a gift card (Amazon USA) :)

details at the bottom of http://www.bolthole.com/solaris/zrep/zrep.documentation.html

crispyduck00 commented 2 years ago

Hmmm, not working as expected. Was already wondering why my one does not work, I am no programmer and my scripting skills are limited, I expected the $? -ne 0 would be triggered when the ssh command fails, but seems in this way it is not working. When I remove the typeset then it works., ???

details at the bottom of http://www.bolthole.com/solaris/zrep/zrep.documentation.html

Ahh, there on the bottom was it. Will check it. ;-)

ppbrown commented 2 years ago

try with

    typeset remotemaster
    remotemaster=`zrep_ssh $desthost $ZFSGETLVAL ${ZREPTAG}:master`

crispyduck00 commented 2 years ago

Yeah, this works!

I expect it is a wanted behavior to still create all the local snapshots when the remote host is not available?

In this case they will/can not expire. Doing this without checking before calling zrep will create a lot of new never sent snapshots.

ppbrown commented 2 years ago

The thing is, there are multiple things that call into _sync(). So its the only single point that I can add the "hey, is remote a master?" check. Otherwise I'd have to duplicate it in multiple places.

As a counterpoint: yes, some people will find value in making the snapshots even when remote is not available. They like "hourly snapshots" for recovery purposes.

crispyduck00 commented 2 years ago

Yes, saw it is in multiple places. For my case, only push vis -S it would make sense in zrep_sync(), but there are several others calling directly _sync().

And yes, if someone uses it for doing regular snapshots even if remote host is unavailable this might break things.

But good to not do master to master anymore, as this could lead to data loss. And also a send is unnecessary if remote host is not reachable.

I have to think how I will do it. Maybe I will simply add the check if remote host is available in my zrepsync.sh script before calling zrep. For sure better than running a modified version of zrep.

Thanks!

ppbrown commented 2 years ago

Great.

btw:

I havent heard back from the other folks who commented on https://github.com/bolthole/zrep/pull/178

If you'd like to do a bit of poking to let me know that yes, the code for that actually works as expected, then I could make an official new zrep release on the current code.

crispyduck00 commented 2 years ago

Hmm, I just tested it, but out of the box it is not working for me.

First I tried to simply use it on one already initialized and working zrep sync fs, and now also from scratch following the documentation:

Set up the filesystem initially on the backup server
Do "zrep init {localfs} {clienthost} {clientfs}"
"zrep failover {localfs}" #to make the client the data master.
zrep refresh pool/fs

But on both I always get: Error: Unrecognized output from src snap. Cannot continue

Could this be related to #178 or I am just doing something wrong.

Back to this one, would it be a big thing adding an additional parameter, or something else to _errquit and skip everything if remote host is not available, or both are master? No Idea how to call it, -quitonerr, something that zrep is not doing anything if remote host is not available or a master conflict is detected. Maybe by adding this check as a own function that respects both ways and call it on every zrep execution where the parameter is given? I am using zrep only for replication with savecount 1. Zrep is perfectly doing this 1-1 replication, and with -f also destroying deleted datasets on dest if they are deleted locally. For backup and snapshot rotation I use zfs-auto-snapshot. So I don't want and don't need zrep to make snapshots when a sync is anyway not possible.

I know, this would be a bigger thing and require changes on several places, maybe I can do it by my own, but not sure if my skills are sufficient to do it correctly without missing a part somewhere. And on the other hand if it will anyway never make a way into zrep, I will do it just for myself, or keep it in a wrapper and avoid calling zrep directly.

ppbrown commented 2 years ago

The check-for-remote is probably a big effort and needs a separate issue request.

for the refresh check... if I introduced a bug with #178, it would be likely that things would fail when using -R, but still work without it.

crispyduck00 commented 2 years ago

Jup, without -R it works. So seems to be a bug added with #178

Should I/we do a separate issue, for check-for-remote? Do you have time for this, or anyway a no go at the moment?

ppbrown commented 2 years ago

needs to be a separate issue, because I dont have time for it at the moment

crispyduck00 commented 2 years ago

Ok. Maybe I write one for it, or we just leave it as it is and I build my wrapper. Don't know it yet. ;-)

Hope you will find some time for a book, as it should be on the way. ;-) Thanks so far!

crispyduck00 commented 2 years ago

Hi, would suggest to use $ZFSGETVAL instead of $ZFSGETLVAL, took me a while to find out why it is still doing something when both are :master=yes ;-)

So with also the typset prior it should be like this:

#   typeset remotemaster
#   remotemaster=`zrep_ssh $desthost $ZFSGETVAL ${ZREPTAG}:master $destfs`
#   if [[ $? -ne 0 ]] ; then
#       zrep_errquit "$desthost is not reachable via ssh? Cannot sync"
#   fi
#   if [[ "$remotemaster" == "yes" ]] ; then
#       zrep_errquit "Other side ($desthost:$destfs) is also master. Split brain detected"
#   fi

ppbrown commented 2 years ago

Hmm.. That doent sound right. I consistently use ZFSGETLVAL everywhere else to get master status, so I think it is required. we need to find how to make it work with ZFSGETLVAL

To validate this, try manually ssh to the other side, and then run yourself, zfs get -H -o value -s local zrep:master /fs/name and see whether it gives expected value or not.

crispyduck00 commented 2 years ago

Hi, yes you are right. Realized it after writing this. If not using local it will always be triggert as after a takeover both have the master, but only one the local master.

ppbrown commented 2 years ago

PS: finishing up comment to your question of, "I expect it is a wanted behavior to still create all the local snapshots when the remote host is not available? In this case they will/can not expire."

to be more accurate, they wont expire.. UNTIL the connection comes back. Then they should get automatically cleaned up soon after next sync.

ppbrown commented 2 years ago

I think I have addressed everything directly related to this issue, so I am closing it. If I forgot something directly to do with the split brain, please feel free to reopen. otherwise, for anything else not covered, but not related to the title, please open a new issue.

ppbrown commented 2 years ago

btw: i just unpacked the book you sent. I dont think I have a direct email for you, so... Thank you so much!! :)

On Thu, Mar 10, 2022 at 8:30 AM crispyduck00 @.***> wrote:

Ok. Maybe I write one for it, or we just leave it as it is and I build my wrapper. Don't know it yet. ;-)

Hope you will find some time for a book, as it should be on the way. ;-) Thanks so far!

— Reply to this email directly, view it on GitHub https://github.com/bolthole/zrep/issues/181#issuecomment-1064257847, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANEV6KM5LRLH3B6USZ4IR3U7IPSLANCNFSM5QJLILUA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

crispyduck00 commented 2 years ago

Great! Thanks for zrep and your support!

bolthole / zrep

don't sync on split brain #181

ensure remote side is not master sanity=zrep_ssh $desthost $ZFSGETLVAL

Normally, dont quit on error. But this is super-bad.