cloudbase / wnbd

Windows Ceph RBD NBD driver
GNU Lesser General Public License v2.1
60 stars 26 forks source link

wnbd driver and qemu-nbd - DriveIO lockup and corruption #63

Closed hgkamath closed 1 year ago

hgkamath commented 2 years ago

I was doing some testing on WIN10-21H2-19044-1415/WNBD-0.2.2-4-g10c1fbe qemu-6.2.0-rc4/ExFAT/qcow2/NTFS

For complete details, see https://gitlab.com/qemu-project/qemu/-/issues/727#note_780895802

On Linux, only the VHDX format corrupts during expansion, while qcow2 and other virtual-disk formats seem fine. But on Windows, I found that even the qcow2 format too can corrupt during expansion (TEST E). For this reason, I think this is a separate distinct bug from the original and wnbd may also be a factor. The windows only bug is in addition to the qemu-only bug, which is also lurking in there.

The bug is reproducible. A script for synthetic data generation is provided in an earlier comment in that issue filing (link).

It is possible that this is a qemu-nbd-windows only bug, but then the wnbd driver is also involved here. The semi-official driver was test-signed and seemed to be still under development. I guess the bug needs to be squashed by troubleshooting and closing in on it from both sides.

The 5 test-cases can also be part of wnbd testing.

Q) Is there are recommended version of qemu-nbd that is known to work reliably with WNBD and the two together have been well tested?

petrutlucian94 commented 1 year ago

For what is worth, the issue doesn't occur when connecting to remote hosts, even with cache.direct=off. I double checked by running wnbd in a vm and connecting to a qemu nbd server that ran on the physical host.

This further suggests that the issue comes from the fact that we have buffering at the wnbd disk level as well as the underlying NBD daemon, which receives the actual wnbd IO requests.

There's probably not much that can be done on the qemu side, other than supporting cache.direct=on with other image formats such as qcow2, assuming that it complies with the IO alignment requirements: https://learn.microsoft.com/en-us/windows/win32/fileio/file-buffering#alignment-and-file-access-requirements. vhd/x images can be attached by Windows natively (e.g. using mount-vhd), so no need to use qemu-nbd/wnbd in that case.

Talking about natively mounted vhd/x images, it's highly likely that Windows uses ZwCreateFile with FILE_NO_INTERMEDIATE_BUFFERING behind the scenes, especially considering that the vhd/x internal structures are sector aligned.

By the way, you may also hit this with the latest qemu MSI: https://gitlab.com/qemu-project/qemu/-/issues/1695.