elastio / elastio-snap

kernel module for taking block-level snapshots and incremental backups of Linux block devices
GNU General Public License v2.0
21 stars 6 forks source link

After reload-incremental will cause error when mount image #194

Open jamesruic opened 1 year ago

jamesruic commented 1 year ago

Hi I'm testing elioctl reload-incremental and reload-snapshot command with latest code, and found it has some problem. After elioctl reload-incremental or reload-snapshot, then update changes to image. This image will cause error while mounting.

Here are my test steps and virtual machine information. My VM is CentOS 7.9 on VMware and its info:

[root@localhost ~]# uname -a
Linux localhost.localdomain 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

[root@localhost ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 908M     0  908M   0% /dev
tmpfs                    919M     0  919M   0% /dev/shm
tmpfs                    919M  8.9M  910M   1% /run
tmpfs                    919M     0  919M   0% /sys/fs/cgroup
/dev/mapper/centos-root   14G  1.8G   12G  13% /
/dev/sdb1                2.0G   33M  2.0G   2% /data
/dev/sda2               1014M  143M  872M  15% /boot
/dev/sda1                200M   12M  189M   6% /boot/efi
tmpfs                    184M     0  184M   0% /run/user/0
/dev/sdc1                 50G   33M   50G   1% /mnt

[root@localhost ~]# lsblk -fp
NAME                        FSTYPE      LABEL           UUID                                   MOUNTPOINT
/dev/sda
├─/dev/sda1                 vfat                        4363-9EE4                              /boot/efi
├─/dev/sda2                 xfs                         3b2b2be2-5144-4004-a677-b637cd956f3c   /boot
└─/dev/sda3                 LVM2_member                 4hODIp-EdZc-ehjb-bM2X-liUE-9BaH-LoO5Tk
  ├─/dev/mapper/centos-root xfs                         ccbc5ed4-d7ad-4fd2-8716-d6d65da8ca6b   /
  └─/dev/mapper/centos-swap swap                        447468cd-858d-4139-90e0-0cae99e27322   [SWAP]
/dev/sdb
└─/dev/sdb1                 xfs                         f7eb5852-c43e-497c-b02c-aeed0c79d570   /data
/dev/sdc
└─/dev/sdc1                 xfs                         ea385966-6f1e-433f-90ab-0833c661da90   /mnt
/dev/sr0                    iso9660     CentOS 7 x86_64 2020-11-04-11-36-43-00

Here's my test steps:

  1. Change reload file then build rpm packages :
    
    [root@localhost ~]# cat elastio-snap/dist/initramfs/dracut/elastio-snap.sh
    #!/bin/sh

type getarg >/dev/null 2>&1 || . /lib/dracut-lib.sh

modprobe elastio-snap

[ -z "$root" ] && root=$(getarg root=) [ -z "$rootfstype" ] && rootfstype=$(getarg rootfstype=)

rbd="${root#block:}" if [ -n "$rbd" ]; then case "$rbd" in LABEL=) rbd="$(echo $rbd | sed 's,/,\x2f,g')" rbd="/dev/disk/by-label/${rbd#LABEL=}" ;; UUID=) rbd="/dev/disk/by-uuid/${rbd#UUID=}" ;; PARTLABEL=) rbd="/dev/disk/by-partlabel/${rbd#PARTLABEL=}" ;; PARTUUID=) rbd="/dev/disk/by-partuuid/${rbd#PARTUUID=}" ;; esac

echo "elastio-snap: root block device = $rbd" > /dev/kmsg

# Device might not be ready
if [ ! -b "$rbd" ]; then
    udevadm settle
fi

# Kernel cmdline might not specify rootfstype
[ -z "$rootfstype" ] && rootfstype=$(blkid -s TYPE "$rbd" -o value)

echo "elastio-snap: mounting $rbd as $rootfstype" > /dev/kmsg
blockdev --setro $rbd
mount -t $rootfstype -o ro "$rbd" /etc/elastio/dla/mnt
udevadm settle

if [ -x /etc/elastio/dla/mnt/elastio-reload ]; then
    /etc/elastio/dla/mnt/elastio-reload
else
    echo "elastio-snap: error: cannot reload tracking data: missing /sbin/elastio_reload" > /dev/kmsg
fi

umount -f /etc/elastio/dla/mnt
blockdev --setrw $rbd

fi

[root@localhost ~]# cat /elastio-reload

!/bin/sh

modprobe elastio-snap -d /etc/elastio/dla/mnt elioctl reload-incremental /dev/sdb1 /.snapshot0 0


2. Create /dev/sdb1 snapshot and copy image to another disk (/dev/sdc):

[root@localhost ~]# elioctl setup-snapshot /dev/sdb1 /data/.snapshot0 0 [root@localhost ~]# cat /proc/elastio-snap-info { "version": "0.11.0", "devices": [ { "minor": 0, "cow_file": "/.snapshot0", "block_device": "/dev/sdb1", "max_cache": 314572800, "fallocate": 213909504, "seq_id": 1, "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034", "version": 1, "nr_changed_blocks": 0, "state": 3 } ] }

[root@localhost ~]# dd if=/dev/elastio-snap0 of=/mnt/mydisk bs=4M 511+1 records in 511+1 records out 2145386496 bytes (2.1 GB) copied, 4.23293 s, 507 MB/s


3. Put the snapshot into incremental mode then reboot to trigger `reload-incremental`:

[root@localhost ~]# elioctl transition-to-incremental 0 [root@localhost ~]# cat /proc/elastio-snap-info { "version": "0.11.0", "devices": [ { "minor": 0, "cow_file": "/.snapshot0", "block_device": "/dev/sdb1", "max_cache": 314572800, "fallocate": 213909504, "seq_id": 1, "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034", "version": 1, "nr_changed_blocks": 9, "state": 2 } ] }

[root@localhost ~]# reboot


4. Check `/proc/elastio-snap-info` and add new file:

[root@localhost ~]# cat /proc/elastio-snap-info { "version": "0.11.0", "devices": [ { "minor": 0, "cow_file": "/.snapshot0", "block_device": "/dev/sdb1", "max_cache": 314572800, "fallocate": 213909504, "seq_id": 1, "uuid": "ae776b8c35124ea4b9eeeb8cebbb8034", "version": 1, "nr_changed_blocks": 9, "state": 2 } ] }

[root@localhost ~]# touch /data/tempfile2 [root@localhost ~]# ls -la /data/ total 4104 drwxr-xr-x. 2 root root 57 Nov 16 14:40 . dr-xr-xr-x. 18 root root 277 Nov 16 14:34 .. ----------. 1 root root 4198400 Nov 16 14:38 .snapshot0 -rw-r--r--. 1 root root 6 Nov 16 14:25 tempfile -rw-r--r--. 1 root root 0 Nov 16 14:40 tempfile2

5. Switch into snapshot mode and update block changes:

[root@localhost ~]# elioctl transition-to-snapshot /.snapshot1 0 [root@localhost ~]# update-img /dev/elastio-snap0 /data/.snapshot0 /mnt/mydisk snapshot is 523776 blocks large copying blocks copying complete: 13 blocks changed, 0 errors

6. Move image to another VM, try to mount it but get some errors:

[root@localhost2 ~]# mount /mnt/mydisk /test/ mount: wrong fs type, bad option, bad superblock on /dev/loop0, missing codepage or helper program, or other error

   In some cases useful info is found in syslog - try
   dmesg | tail or so.

[root@localhost2 ~]# dmesg [ 32.873553] XFS (loop0): Mounting V5 Filesystem [ 32.884892] XFS (loop0): Corruption warning: Metadata has LSN (1:2303) ahead of current LSN (1:2271). Please unmount and run xfs_repair (>= v4.3) to resolve. [ 32.884894] XFS (loop0): log mount/recovery failed: error -22 [ 32.884918] XFS (loop0): log mount failed


It has the same problem when use `elioctl reload-snapshot` within this test.
kgermanov commented 1 year ago

@105590023 Looks like same as https://github.com/elastio/elastio-snap/issues/63. Is it reproducable without reboot? Another direction. Could you test reboot with shutdown script?

root@user-vm:/home/kgermanov# cat  /lib/systemd/system-shutdown/umount_rootfs.shutdown
#!/bin/sh

sync
mount -o remount,ro /
umount /
e-kov commented 1 year ago

@105590023 There is another interesting case to check. Does this issue with Corruption warning in dmesg happen with ext4 FS? If no, @kgermanov is right. It looks like #63.

jamesruic commented 1 year ago

@kgermanov Thank you for your reply. I use the shutdown script, still get the error.

[root@localhost ~]# cat /lib/systemd/system-shutdown/umount_rootfs.shutdown
#!/bin/sh

sync
mount -o remount,ro /
umount /
[root@localhost ~]# dmesg
[   45.309376] loop: module loaded
[   45.325215] XFS (loop0): Mounting V5 Filesystem
[   45.336186] XFS (loop0): Corruption warning: Metadata has LSN (1:2704) ahead of current LSN (1:2679). Please unmount and run xfs_repair (>= v4.3) to resolve.
[   45.336188] XFS (loop0): log mount/recovery failed: error -22
[   45.336208] XFS (loop0): log mount failed
jamesruic commented 1 year ago

@e-kov Thank you for your reply. It works fine in ext4 FS.

kgermanov commented 1 year ago

@jamesruic Could you retest with this steps?:

[root@localhost ~]# cat /elastio-reload
#!/bin/sh
elioctl reload-snapshot /dev/sdb1 /.snapshot0 0

[root@localhost ~]# xfs_freeze /data
[root@localhost ~]# sync
[root@localhost ~]# elioctl setup-snapshot -c 10 -f 200 /dev/sdb1 /data/.snapshot0 0
[root@localhost ~]# xfs_freeze -u /data
[root@localhost ~]# mount /dev/elastio-snap0 /test/ && sleep 1 && umount /test
[root@localhost ~]# dmesg | grep elastio
[root@localhost ~]# systemctl start reboot.target
<after reboot>
[root@localhost ~]# mount /dev/elastio-snap0 /test/
[root@localhost ~]# dmesg
e-kov commented 1 year ago

@kgermanov I'm afraid elioctl setup-snapshot will hang after xfs_freeze, because it couldn't allocate CoW file at the frozen FS.

kgermanov commented 1 year ago

@e-kov yes, you are right(

anelson commented 1 year ago

@e-kov is incremental after reboot broken in general?

e-kov commented 1 year ago

@anelson no. It's not broken in general. The issue is with XFS only. This is a manifestation of a problem with the mount and XFS logs #63

anelson commented 1 year ago

Discussed on planning. Scope is clear now.

anelson commented 1 year ago

This is technically a duplicate of #63, however @e-kov has asked to keep this issue open separately as it contains another useful scenario with which to validate a future fix of #63.

jamesruic commented 1 year ago

Is it possible to use register_reboot_notifier to do some processing on the block device before shutting down the system? Maybe register one notifier through register_reboot_notifier() function when module init, and do something like: change to snapshot mode or freeze device. I'm not sure if it will help. https://elixir.bootlin.com/linux/v3.10/source/kernel/sys.c#L344

int shutdown_notification(struct notifier_block *nb, unsigned long action, void* unused) {
    // do something

    return NOTIFY_DONE;
}

static struct notifier_block reboot_notifier = {
    .notifier_call = &shutdown_notification,
    .priority = INT_MAX
};

int __init init_module(void) {
    ...
    register_reboot_notifier(&reboot_notifier);
    ...
}

void __exit exit_module(void) {
    ...
    unregister_reboot_notifier(&reboot_notifier);
    ...
}