Open hgkamath opened 1 year ago
Hi,
Thanks for opening this issue.
nb: IMHO https://github.com/cloudbase/wnbd/issues/63 still happens, presently it is closed as it is hard to identify where the fault is. qemu on windows is a bit buggy, but I argue that even if nbd-server is buggy, wnbd should detect lockup situation, perhaps eject disk and bailout.
The following is an excerpt from https://github.com/cloudbase/wnbd/issues/63#issuecomment-997204171. Uninstalling and reinstalling the driver is the only way to unstuck the situation. At this point, all not-responding processes/apps come back alive. Until I discovered this, I was only force shutting down laptop. the following don't work ctrl-C or taskmgr-end-task on qemu-nbd or xcopy, attempt wnd-client unmap.
Storport already detects IO timeouts and issues lun resets. When receiving a lun reset, we're simply emptying the IO queues. In this case, I think we should actually reset the NBD connection, which would probably fix the problem. I'll prepare a PR in the upcoming weeks.
By the way, can you double check what happens when connecting to the same nbd server using a linux nbd client?
In the meantime, we added an adapter reset command. It's more convenient than having to reinstall the driver.
wnbd-client.exe reset-adapter --hard-disconnect-mappings
I'll give that a try.
While what you say about reset-adapter is true, the shell addresses the problem when the exe can't even be started.
All command-line commands including reset-adapter, should be doable from the wnbd-shell.
I might need to update my wmbd driver
PS C:\lmgmt\M_Capella-PC\scripts> D:\vstorage\nbd\wnbd_client\wnbd-client.exe version
wnbd-client.exe: 0.2.2-11-g3dbec5e
libwnbd.dll: 0.2.2-11-g3dbec5e
wnbd.sys: 0.2.2-11-g3dbec5e
PS C:\lmgmt\M_Capella-PC\scripts> D:\vstorage\nbd\wnbd_client\wnbd-client.exe
wnbd-client commands:
version | -v Get the client, library and driver version.
help | -h | --help List all commands or get more details about a specific
command.
list | ls List WNBD disks.
show Show detailed disk information.
map Create a new disk mapping, connecting to the specified NBD
server.
unmap | rm Remove disk mapping.
stats Get disk stats.
list-opt List driver options.
get-opt Get driver option.
set-opt Set driver option.
reset-opt Reset driver option.
install-driver Install WNBD driver and create its adapter.
uninstall-driver Hard remove all disk mappings and adapters and uninstall all
WNBD driver instances.
The problem with this webpage
https://cloudbase.it/ceph-for-windows/
is that it has a link
https://cloudba.se/ceph-win-latest-quincy
which downloads file
ceph_quincy_beta.msi
whose file-properties/details-tab have a date-created
field 4/14/2023 2:09PM
But this could be just the download date (today).
Otherwise the webpage/filename does not give a clue about a version update.
Even then, its not always true that when ceph drivers are updated, there is an update in wnbd driver.
Is there a sure proof way for a user to determine if an updated wnbd driver is present. ? What if a user wanted to download a specific older version?
Extracted updated wnbd-driver as of 20230414
PS C:\Windows\system32> D:\vstorage\nbd\wnbd_client\wnbd-client.exe version
wnbd-client.exe: 0.4.1-10-g5c5239c
libwnbd.dll: 0.4.1-10-g5c5239c
wnbd.sys: 0.4.1-10-g5c5239c
PS C:\Windows\system32> D:\vstorage\nbd\wnbd_client\wnbd-client.exe
wnbd-client commands:
version | -v Get the client, library and driver version.
help | -h | --help List all commands or get more details about a specific
command.
list | ls List WNBD disks.
show Show detailed disk information.
map Create a new disk mapping, connecting to the specified NBD
server.
unmap | rm Remove disk mapping.
stats Get disk stats.
list-opt List driver options.
get-opt Get driver option.
set-opt Set driver option.
reset-opt Reset driver option.
install-driver Install WNBD driver and create its adapter.
uninstall-driver Hard remove all disk mappings and adapters and uninstall all
WNBD driver instances.
reset-adapter Resets the WNBD adapter using PnP. Existing disk mappings need
to be removed.
By the way, can you double check what happens when connecting to the same nbd server using a linux nbd client?
btw, recently, in qemu-project, a vhdx corruption bug was resolved.
https://gitlab.com/qemu-project/qemu/-/issues/727#note_1347303636
In that comment, you can see that a local qemu-storage-daemon on Linux works well with linux local nbd-client.
On my single laptop, I don't think i have a way to do a nbd-share from qemu-storage-daemon of a windows build on a windows machine, and nbd-connect to that from a nbd-client on a Linux machine. Involving a VM is perhaps not the right way to test this. But, as they are platform builds from the same code, they should mostly have same effects but for a little uncertainty in differences due to the file-access layer in windows.
I learnt 2 things
powershell.exe
-v5.1.19041.2673, it won't load/start.cmd.exe
, it can load.cmd.exe
windows open. C:\Windows\system32>D:\vstorage\nbd\wnbd_client\wnbd-client.exe reset-adapter
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 0.18s, time left: 9.8s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 1.38s, time left: 8.6s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 2.56s, time left: 7.4s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 3.76s, time left: 6.2s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 4.95s, time left: 5.1s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 6.13s, time left: 3.9s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 7.32s, time left: 2.7s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 14.44s, time left: -4.4s.
... gives up after about 8 times.
C:\Windows\system32>
C:\Windows\system32>D:\vstorage\nbd\wnbd_client\wnbd-client.exe uninstall-driver
libwnbd.dll!WnbdRemoveAllDisks INFO Hard removing WNBD disk: gkpics01
libwnbd.dll!RemoveWnbdAdapterDevice INFO Removing WNBD adapter device. Hardware id: root\wnbd. Class GUID: {4D36E97B-E325-11CE-BFC1-08002BE10318}
libwnbd.dll!CleanDrivers INFO Removing WNBD driver: oem41.inf
C:\Windows\system32>
... now I get back control.
oops, I forgot the --hard-disconnect-mappings
argument. Given that the exe was able to start and run, I think it will work. I'll try that next time. [EDIT] I did get to try it, i think it said 'operation vetoed', couldn't copy/save the error-texts.
I will log further details pertaining to this stuck situation in #63-comment-1508390090
We found out that the IO deadlock was caused by having Windows caching enabled on the WNBD disk side as well as the underlying local NBD server side. Disabling caching on the qemu-storage-daemon
side solved the issue (cache.direct=on
). This does not affect Ceph.
In the meantime, the nbd client functionality has been moved to libwnbd
and wnbd-client map
became a blocking command. wnbd-client
has just a few simple commands, adding an interactive shell mode wouldn't help much IMHO.
Description:
It is desirable to have a shell interface mode for wnbd-client, just as qemu, guestfish, do.
It is desirable that the no-arg invocation starts the shell. Presently the no-arg invocation just prints the help. Example output:
The shell mode is just a simple read eval-string-array print loop that
exit
is the only one that is newly introduced, which terminates the loop, exits the wnbd-shell process and returns to outside-shell.Reason: The advantage of a shell interface mode is the executable is already loaded in memory, running as a process, waiting for user's stdin input. This is unlike starting the application with arguments on the command prompt, wherein the binary has to be loaded from a drive, and a new process has to be created.
When drive lockup happens, OS might not be able to load and run a new command, an it prevents doing the following save from trouble strategy, which is to have a open terminal window with the wnbd shell running.
nb: IMHO #63 still happens, presently it is closed as it is hard to identify where the fault is. qemu on windows is a bit buggy, but I argue that even if nbd-server is buggy, wnbd should detect lockup situation, perhaps eject disk and bailout.
The following is an excerpt from #63-comment-997204171. Uninstalling and reinstalling the driver is the only way to unstuck the situation. At this point, all not-responding processes/apps come back alive. Until I discovered this, I was only force shutting down laptop. the following don't work ctrl-C or taskmgr-end-task on qemu-nbd or xcopy, attempt wnd-client unmap.
I think this was possible because the windows-OS was not so stuck that wnbd-client could not run. The drive lockups can be so bad that even thats not possible. When stuck happens, its possible to switch between open windows, that take user input. But the moment an application has to access the disk (such as how a browser always does, or when pressing ctrl-S in notepad) the gui becomes becomes stuck. Windows-taskbar becomes stuck quickly for the same reason. Its possible to start taskmgr via ctrl-alt-delete but, taskmgr can't really load any information in its gui.
Its really important, that when in shell mode, wnbd-client does not access any disk file/cause disk-access, not even configuration files etc. Otherwise it will also get stuck. This is unless the command it self require a file argument like
install-driver
. For this reason I don't recommend implementing history as history would require maintaining a history log on disk. Or maybe an option to skip reading and writing disk-state.There's no guarantee that shell mode will be able save the lockup situation. But its worth a try, and even if it doesn't its a feature addition that is harmless, small, simple and low maintenance.