Open thememika opened 7 months ago
Sorry, the fix was to first replay the journal using a machine with old version of KVDO. (And shut down the vdo cleanly). Thanks for the update and integration with the Linux kernel!
Unfortunately, I have to reopen the issue because there is likely a bug related to the new version (in-kernel) of KVDO. More specifically, once you attempt to bring up a VDO device which uses the old journal format and is dirty, after you get the "Unsupported component" error, it will be no longer possible to correctly replay the journal — even when you use the old (out-of-tree) KVDO. The device will always be treated as clean, regardless of if it is dirty or not. It leads to an attempt of read-write access to a dirty device without any replay, which is turn results in errors very soon. Example with one of my devices:
[ 946.149687] xfs filesystem being mounted at /******* supports timestamps until 2038-01-19 (0x7fffffff)
...
[ 1038.482749] device-mapper: vdo: dm-vdo1:cpuQ0: Completing read vio for LBN 65663121 with error after read_data_vio: VDO Status: Compressed block fragment is invalid (1483)
[ 1038.482773] device-mapper: vdo: dm-vdo1:cpuQ0: vdo_status_to_errno: mapping internal status code 1483 (VDO_INVALID_FRAGMENT: VDO Status: Compressed block fragment is invalid) to EIO
...
[ 1038.482930] XFS (dm-9): metadata I/O error in "xfs_btree_read_buf_block+0xb7/0x160" at daddr 0x1f31b328 len 8 error 5
I have just lost two of my devices like that, and as I see, the only option now is a read-only rebuild, copying to another block device (which is R/W), and then a hard fsck — I bet with a moderate data loss.
Other devices, which were not dirty back when I had old KVDO, are operating correctly and stable.
A one more thing to note: the VDO stats are broken (in-kernel KVDO, the names of my devices are replaced with "****")
$ su -c vdostats
Device 1k-blocks Used Available Use% Space saving%
**** 0 0 0 -2147483648% 0%
**** 0 0 0 -2147483648% 0%
**** 0 0 0 -2147483648% 0%
**** 0 0 0 -2147483648% 0%
**** 0 0 0 -2147483648% 0%
**** 0 0 0 -2147483648% 0%
While we can live without the userspace VDO tools for some time, I believe that the issue (described in the post above) affecting dirty devices of old journal format is severe. It poses an unpredictable risk to the correctness of devices, especially for those who weren't warned about the issue in any way.
Sorry again, it turned out that the broken dirty devices were easily fixable by a forced «rebuild». They are now R/W, give no errors, and likely most (if not all) of the data is safe and untouched.
To rebuild the vdo device, you first need to stop it (remove the devmapper entry).
Then, I used the tools from vdo-devel
. These tools contain the ./src/c++/vdo/bin/vdoforcerebuild
binary after they are built.
I executed that binary with the path to my VDOs' physical devices. As I got, it just sets a mark for rebuild. (It doesn't rebuild by itself).
And then, when you attempt to bring up the devices, it may take a significant amount of time. But finally, the device will be absolutely OK, and up for R/W.
Thanks!
UPD: Although, I still believe that that problem is a bug which needs to be fixed
Hi @thememika, can you share which version you were using prior to upgrading to the in-kernel version? I'm glad you were able to discover the forced rebuild
operation to get back into operation.
I'd like for us to reproduce this situation so we can better understand it and figure out what needs to be done.
Thanks, -Andy
Hi @rhawalsh, thanks for your reply! I was using this version before:
kvdo: modprobe: loaded version 8.2.3.3
After that, I moved (from linux-6.8) to linux 6.9-rc2 with it's built-in KVDO.
To reproduce the issue, you can simply:
forced rebuild
).Thanks!
Hi @thememika. We'll take a look at it. Thank you for the report!
Just to clarify, you did the forced rebuild on the old (8.2.3.3) version, or the in-kernel 6.9-rc2 version?
Thanks for the attention, @rhawalsh. The forced rebuild was done on the 6.9-rc2 kernel. After that, my devices started working, so I stay on the new version.
Hi @thememika, just wanted to give a quick update.
I ran through a few scenarios, and I was able to reproduce what you're seeing. I'm still playing around with things to figure out how best to document this. However, the bottom line is to try and always make sure you're cleanly shutting down the VDO volume(s) before making any changes. Though in my testing from the 8.2.3.3 to 6.9-rc2 on Fedora Rawhide, I was able to repair the volume going both to and from either end after a dirty shutdown by using the vdoforcerebuild
utility as mentioned.
My reproduction environment was done using an iSCSI target and two initiators.
I went through a few scenarios.
echo b > /proc/sysrq-trigger
) on RHEL9 followed by transfer to Rawhide.echo b > /proc/sysrq-trigger
) on Rawhide followed by transfer to RHEL9.Scenario 1:
vgcreate vg_name /dev/sda; lvcreate -l 100%FREE -V 1T -n vdo_name vg_name
)mkfs.xfs -K /dev/vg_name/vdo_name
) and mount it (mount /mnt/vdo
)cp -a /usr /mnt/vdo
)lvchange -an /dev/vg_name/vdo_name
)lsblk
)Scenario 2 is the same as Scenario 1, but with the initiators swapped.
Scenario 3 replaced the step where VDO is gracefully stopped with a forced reboot via echo b > /proc/sysrq-trigger
followed by a removal of the initiator's access to the LUN on the target to prevent a possible multiple activation situation. Then, Initiator 2 logged into the target, where we could see the same errors you reported previously. Upon running vdoforcerebuild /dev/mapper/vg_name-vpool0_vdata
followed by a deactivate/activate cycle (lvchange -an /dev/vg_name/vdo_name; lvchange -ay /dev/vg_name/vdo_name
). At this point the volume's normal operation has resumed after performing the read-only rebuild. The remaining steps in Scenario 1 and 2 continued from here.
Scenario 4 is the same as Scenario 3, but with Initiators swapped, once again.
I did a little bit more digging into the behavior that was experienced here, since just a basic 'unclean shutdown' shouldn't really render a VDO volume to require a forced rebuild.
If I take the same setup as mentioned above, and set up a volume on a RHEL9 host, forcibly reset the host, and then let the system come back up, the VDO volume starts with an automatic recovery and no manual intervention required. It is specifically the act of trying to move the dirty volume from RHEL9 to the upstream version that causes this particular behavior.
I don't believe this is particularly a new thing with VDO volumes. We've always had an "upgrade" when moving from one version to the next. Though, I can't say that I've seen the issue where simply attempting to start a dirty volume on the new version would cause it to require a force-rebuild. That's something I need to talk with the team about a bit more.
First of all thank you for your efforts and work, this is just great!
On the bright side: performance seems improved :) I went from 2000MB/s max throughput to 3500 (non vdo partition is still about 5000 MB/s though, so I guess there is some room left for improvements). Max IOPS is about 400000 which is in line with the non vdo partition.
On the not so bright side: at first everyting just worked with 6.9 but after some onrelated reboots to the old dkms vdo and some fiddling with my system I too faced the read only problem mentioned above. What finally helped me was firstly I was lucky enough to have included the vdo userspace tools in my initramfs and second that I managed to figure out the right vdoforcerebuild command:
vdoforcerebuild /dev/mapper/neo-vpool0_vdata
You can forget about that neo part :) But the rest seems important: you don't point it to your virtual combined LV device but rather to the underlying data LV ... right?
In the end nothing was lost and I'm happy now :+1: Perhaps this is of use to others.
That is correct. vdoforcerebuild, and most of the other userspace tools, must operate directly on the pool_data device.
Thanks for your kind words, and I'm glad it's working out for you.
After reading the 6.9-rc1 update notice from Linus, I was excited to know that KVDO has been finally merged to the Linux kernel. But unfortunately, I'm not able to use any of my production VDO devices with it. The recovery journal is in old format, and as I got, there is no way to bring the devices up for r/w without re-creating them and copying terabytes of data.
I don't understand why you change the format of journal without writing a conversion code for it. What is the reason behind doing it? I can't use the in-kernel KVDO because of that. Is it true that everyone will now have to re-create the VDO devices from scratch, and copy all the data over?