cloudbase / wnbd

Windows Ceph RBD NBD driver
GNU Lesser General Public License v2.1
60 stars 26 forks source link

[RFE] create a shell interface mode for wnbd-client #120

Open hgkamath opened 1 year ago

hgkamath commented 1 year ago

Description:

It is desirable to have a shell interface mode for wnbd-client, just as qemu, guestfish, do.
It is desirable that the no-arg invocation starts the shell. Presently the no-arg invocation just prints the help. Example output:

PS> D:\vstorage\nbd\wnbd_client\wnbd-client.exe <enter>
WNBD> help<enter>
<show help>
WNBD> version<enter>
<show version>
WNBD> uninstall-driver<enter>
<uninstall the driver>
WNBD> exit<enter>
<say goodbye, and return to shell>
PS> 

The shell mode is just a simple read eval-string-array print loop that

Reason: The advantage of a shell interface mode is the executable is already loaded in memory, running as a process, waiting for user's stdin input. This is unlike starting the application with arguments on the command prompt, wherein the binary has to be loaded from a drive, and a new process has to be created.
When drive lockup happens, OS might not be able to load and run a new command, an it prevents doing the following save from trouble strategy, which is to have a open terminal window with the wnbd shell running.

nb: IMHO #63 still happens, presently it is closed as it is hard to identify where the fault is. qemu on windows is a bit buggy, but I argue that even if nbd-server is buggy, wnbd should detect lockup situation, perhaps eject disk and bailout.

The following is an excerpt from #63-comment-997204171. Uninstalling and reinstalling the driver is the only way to unstuck the situation. At this point, all not-responding processes/apps come back alive. Until I discovered this, I was only force shutting down laptop. the following don't work ctrl-C or taskmgr-end-task on qemu-nbd or xcopy, attempt wnd-client unmap.

I think this was possible because the windows-OS was not so stuck that wnbd-client could not run. The drive lockups can be so bad that even thats not possible. When stuck happens, its possible to switch between open windows, that take user input. But the moment an application has to access the disk (such as how a browser always does, or when pressing ctrl-S in notepad) the gui becomes becomes stuck. Windows-taskbar becomes stuck quickly for the same reason. Its possible to start taskmgr via ctrl-alt-delete but, taskmgr can't really load any information in its gui.

Its really important, that when in shell mode, wnbd-client does not access any disk file/cause disk-access, not even configuration files etc. Otherwise it will also get stuck. This is unless the command it self require a file argument like install-driver. For this reason I don't recommend implementing history as history would require maintaining a history log on disk. Or maybe an option to skip reading and writing disk-state.

There's no guarantee that shell mode will be able save the lockup situation. But its worth a try, and even if it doesn't its a feature addition that is harmless, small, simple and low maintenance.

petrutlucian94 commented 1 year ago

Hi,

Thanks for opening this issue.

nb: IMHO https://github.com/cloudbase/wnbd/issues/63 still happens, presently it is closed as it is hard to identify where the fault is. qemu on windows is a bit buggy, but I argue that even if nbd-server is buggy, wnbd should detect lockup situation, perhaps eject disk and bailout.

The following is an excerpt from https://github.com/cloudbase/wnbd/issues/63#issuecomment-997204171. Uninstalling and reinstalling the driver is the only way to unstuck the situation. At this point, all not-responding processes/apps come back alive. Until I discovered this, I was only force shutting down laptop. the following don't work ctrl-C or taskmgr-end-task on qemu-nbd or xcopy, attempt wnd-client unmap.

Storport already detects IO timeouts and issues lun resets. When receiving a lun reset, we're simply emptying the IO queues. In this case, I think we should actually reset the NBD connection, which would probably fix the problem. I'll prepare a PR in the upcoming weeks.

By the way, can you double check what happens when connecting to the same nbd server using a linux nbd client?

In the meantime, we added an adapter reset command. It's more convenient than having to reinstall the driver.

wnbd-client.exe reset-adapter --hard-disconnect-mappings
hgkamath commented 1 year ago

I'll give that a try. While what you say about reset-adapter is true, the shell addresses the problem when the exe can't even be started.
All command-line commands including reset-adapter, should be doable from the wnbd-shell.

I might need to update my wmbd driver

PS C:\lmgmt\M_Capella-PC\scripts> D:\vstorage\nbd\wnbd_client\wnbd-client.exe version
wnbd-client.exe: 0.2.2-11-g3dbec5e
libwnbd.dll: 0.2.2-11-g3dbec5e
wnbd.sys: 0.2.2-11-g3dbec5e

PS C:\lmgmt\M_Capella-PC\scripts> D:\vstorage\nbd\wnbd_client\wnbd-client.exe
wnbd-client commands:

version | -v       Get the client, library and driver version.
help | -h | --help List all commands or get more details about a specific
                   command.
list | ls          List WNBD disks.
show               Show detailed disk information.
map                Create a new disk mapping, connecting to the specified NBD
                   server.
unmap | rm         Remove disk mapping.
stats              Get disk stats.
list-opt           List driver options.
get-opt            Get driver option.
set-opt            Set driver option.
reset-opt          Reset driver option.
install-driver     Install WNBD driver and create its adapter.
uninstall-driver   Hard remove all disk mappings and adapters and uninstall all
                   WNBD driver instances.

The problem with this webpage https://cloudbase.it/ceph-for-windows/ is that it has a link
https://cloudba.se/ceph-win-latest-quincy
which downloads file
ceph_quincy_beta.msi
whose file-properties/details-tab have a date-created field 4/14/2023 2:09PM
But this could be just the download date (today).
Otherwise the webpage/filename does not give a clue about a version update.
Even then, its not always true that when ceph drivers are updated, there is an update in wnbd driver.

Is there a sure proof way for a user to determine if an updated wnbd driver is present. ? What if a user wanted to download a specific older version?


Extracted updated wnbd-driver as of 20230414

PS C:\Windows\system32>  D:\vstorage\nbd\wnbd_client\wnbd-client.exe version 
wnbd-client.exe: 0.4.1-10-g5c5239c
libwnbd.dll: 0.4.1-10-g5c5239c
wnbd.sys: 0.4.1-10-g5c5239c

PS C:\Windows\system32> D:\vstorage\nbd\wnbd_client\wnbd-client.exe
wnbd-client commands:

version | -v       Get the client, library and driver version.
help | -h | --help List all commands or get more details about a specific
                   command.
list | ls          List WNBD disks.
show               Show detailed disk information.
map                Create a new disk mapping, connecting to the specified NBD
                   server.
unmap | rm         Remove disk mapping.
stats              Get disk stats.
list-opt           List driver options.
get-opt            Get driver option.
set-opt            Set driver option.
reset-opt          Reset driver option.
install-driver     Install WNBD driver and create its adapter.
uninstall-driver   Hard remove all disk mappings and adapters and uninstall all
                   WNBD driver instances.
reset-adapter      Resets the WNBD adapter using PnP. Existing disk mappings need
                   to be removed.
hgkamath commented 1 year ago

By the way, can you double check what happens when connecting to the same nbd server using a linux nbd client?

btw, recently, in qemu-project, a vhdx corruption bug was resolved.
https://gitlab.com/qemu-project/qemu/-/issues/727#note_1347303636
In that comment, you can see that a local qemu-storage-daemon on Linux works well with linux local nbd-client.

On my single laptop, I don't think i have a way to do a nbd-share from qemu-storage-daemon of a windows build on a windows machine, and nbd-connect to that from a nbd-client on a Linux machine. Involving a VM is perhaps not the right way to test this. But, as they are platform builds from the same code, they should mostly have same effects but for a little uncertainty in differences due to the file-access layer in windows.

hgkamath commented 1 year ago

I learnt 2 things

  1. When in stuck state, if wnbd-client is executed from withinpowershell.exe-v5.1.19041.2673, it won't load/start.
    But, if wnbd-client is started from withincmd.exe, it can load.
    Unsure, how and why this is the case. It would be interesting to know.
    But, what this means is, at least for my present purposes, I can still do without a shell mode.
    This does not mean shell mode will never be necessary, what if another day arrives with even more serious stuck situation in which even cmd isn't helpful.
    Lesson: keep a few administrative privilege cmd.exe windows open.
  2. As seen from the logs below, reset-adapter is not powerful enough to force its way through and unstuck the situation.
    But, uninstall-driver can.
C:\Windows\system32>D:\vstorage\nbd\wnbd_client\wnbd-client.exe reset-adapter
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 0.18s, time left: 9.8s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 1.38s, time left: 8.6s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 2.56s, time left: 7.4s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 3.76s, time left: 6.2s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 4.95s, time left: 5.1s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 6.13s, time left: 3.9s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 7.32s, time left: 2.7s.
libwnbd.dll!WnbdResetAdapter WARNING Could not reset WNBD adapter. Device in use, operation vetoed.
libwnbd.dll!WnbdResetAdapterEx WARNING Could not reset adapter, device busy. Time elapsed: 14.44s, time left: -4.4s.
... gives up after about 8 times.
C:\Windows\system32>

C:\Windows\system32>D:\vstorage\nbd\wnbd_client\wnbd-client.exe uninstall-driver
libwnbd.dll!WnbdRemoveAllDisks INFO Hard removing WNBD disk: gkpics01
libwnbd.dll!RemoveWnbdAdapterDevice INFO Removing WNBD adapter device. Hardware id: root\wnbd. Class GUID: {4D36E97B-E325-11CE-BFC1-08002BE10318}
libwnbd.dll!CleanDrivers INFO Removing WNBD driver: oem41.inf
C:\Windows\system32>
... now I get back control.

oops, I forgot the --hard-disconnect-mappings argument. Given that the exe was able to start and run, I think it will work. I'll try that next time. [EDIT] I did get to try it, i think it said 'operation vetoed', couldn't copy/save the error-texts.

I will log further details pertaining to this stuck situation in #63-comment-1508390090

petrutlucian94 commented 1 year ago

We found out that the IO deadlock was caused by having Windows caching enabled on the WNBD disk side as well as the underlying local NBD server side. Disabling caching on the qemu-storage-daemon side solved the issue (cache.direct=on). This does not affect Ceph.

In the meantime, the nbd client functionality has been moved to libwnbd and wnbd-client map became a blocking command. wnbd-client has just a few simple commands, adding an interactive shell mode wouldn't help much IMHO.