bolthole / zrep

ZREP ZFS based replication and failover script from bolthole.com
Other
251 stars 57 forks source link

zrep sync fails after solaris 11 upgrade from SRU60 to SRU68 #207

Open normans2 opened 5 months ago

normans2 commented 5 months ago

zrep -S all was running OK via crontab and stopped following Solaris OS upgrade from SRU60 to SRU68. ZREP_VERSION=2.0.2

Cleared out zrep data and replicated datasets on secondary then re-initialized parent dataset which worked OK and I can see snapshots on both sides.

# zfs list -t snapshot | grep zrep data/repo@zrep_000000 0 - 144K - data/repo/sol11-4-full@zrep_000000 0 - 10.2G - data/repo/sol11.4.56.138.2@zrep_000000 0 - 26G - data/repo/sol11.4.60.151.2@zrep_000000 0 - 39.2G - data/repo/sol11.4.68.164.2@zrep_000000 0 - 52.4G - # zfs list -t snapshot | grep zrep data/repo@zrep_000000 0 - 144K - data/repo/sol11-4-full@zrep_000000 0 - 10.2G - data/repo/sol11.4.56.138.2@zrep_000000 0 - 26G - data/repo/sol11.4.60.151.2@zrep_000000 0 - 39.2G - data/repo/sol11.4.68.164.2@zrep_000000 0 - 52.4G - On the first sync get the following errors... Expiring zrep snaps on data/repo Also running expire on :data/repo now... Expiring zrep snaps on data/repo cannot open 'data/repo/sol11-4-full@zrep_000000': snapshot does not exist cannot destroy 'data/repo/sol11-4-full@zrep_000000': snapshot does not exist no snapshots destroyed cannot open 'data/repo/sol11-4-full@zrep_000001': snapshot does not exist cannot destroy 'data/repo/sol11-4-full@zrep_000001': snapshot does not exist no snapshots destroyed cannot open 'data/repo/sol11.4.56.138.2@zrep_000000': snapshot does not exist cannot destroy 'data/repo/sol11.4.56.138.2@zrep_000000': snapshot does not exist no snapshots destroyed The snapshots on secondary no longer exist. Strangely only 3 of the snapshots appear in the error output and one of which is numbered 000001 rather than 000000. Any subsequent sync request result in... cannot receive: destination data/repo has been modified since most recent snapshot I have not been able to trace what has introduced this behaviour since upgrading Solaris OS. Any help would be greatly appreciated.
ppbrown commented 5 months ago

Hm. I'm afraid I cant test this myself, as I no longer run solaris. Best to ask help from forums. seems like they broke compatibility in some way.

Either, this is a bug, or this is deliberate feature transition. If it isnt just a bug, and if you find a definitive answer and solution, please let me know

On Thu, Jun 27, 2024 at 2:31 AM normans2 @.***> wrote:

zrep -S all was running OK via crontab and stopped following Solaris OS upgrade from SRU60 to SRU68. ZREP_VERSION=2.0.2

Cleared out zrep data and replicated datasets on secondary then re-initialized parent dataset which worked OK and I can see snapshots on both sides.

zfs list -t snapshot | grep zrep

@._000000 0 - 144K - @._000000 0 - 10.2G - @._000000 0 - 26G - @._000000 0 - 39.2G - @.***_000000 0 - 52.4G -

zfs list -t snapshot | grep zrep

@._000000 0 - 144K - @._000000 0 - 10.2G - @._000000 0 - 26G - @._000000 0 - 39.2G - @.***_000000 0 - 52.4G -

On the first sync get the following errors...

Expiring zrep snaps on data/repo Also running expire on :data/repo now... Expiring zrep snaps on data/repo cannot open @._000000': snapshot does not exist cannot destroy @._000000': snapshot does not exist no snapshots destroyed cannot open @._000001': snapshot does not exist cannot destroy @._000001': snapshot does not exist no snapshots destroyed cannot open @._000000': snapshot does not exist cannot destroy @._000000': snapshot does not exist no snapshots destroyed

The snapshots on secondary no longer exist. Strangely only 3 of the snapshots appear in the error output and one of which is numbered 000001 rather than 000000.

Any subsequent sync request result in...

cannot receive: destination data/repo has been modified since most recent snapshot

I have not been able to trace what has introduced this behaviour since upgrading Solaris OS.

Any help would be greatly appreciated.

— Reply to this email directly, view it on GitHub https://github.com/bolthole/zrep/issues/207, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANEV6IZBX2DZLS4V6MBZS3ZJPLVVAVCNFSM6AAAAABJ7Q6MKGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGM3TONRUHAYTCOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

normans2 commented 5 months ago

Hi, Thanks for the quick response, I will reach out to the forums as suggested.

okapia commented 3 months ago

I hit what looks like the same issue as this. In Solaris, they appear to have changed the meaning of the -d flag to zfs list. Previously and with openzfs, the specified depth included the snapshot so -d 1 would get top-level snapshots. Now the depth only applies to the actual datasets. I can see why they would have seen this as an improvement but changing an established interface in this manner is really bad.

The solution that worked for me was to modify the zrep source and set DEPTHCAP="-d 0" -d 0 will list nothing on older Solaris/OpenZFS but on newer Solaris lists just the snapshots of the specified top-level dataset.

This is a fairly simple workaround and the problem may affect others so some sort of adaptation for zrep, even if only in documentation would be good.

normans2 commented 3 months ago

Thanks okapia, I will give your suggestion a try.

normans2 commented 3 months ago

Worked for me, thanks again :)