At the moment, we have about 4 overlapping (providing the same fields),
highly interconnected structures that describe a device, as well as two device
lists. Locking isn't ideal either, we're using more locks than we need.
Operations such as creating, listing or removing devices are holding exclusive
locks more than they should.
At the same time, the device cleanup workflow is convoluted and doesn't
guarantee that all the pending requests are completed by the time we cleanup
disk devices or the entire extension, which often leads to memory leaks.
This refactoring is meant to greatly simplify device handling, making the driver
easier to maintain and also avoid a few memory leaks. Here are the changes that
we're going to make:
drop unneeded (but heavily used) structures: WNBD_LU_EXTENSION,
GLOBAL_INFORMATION
merge structures that provide device info: USER_ENTRY and
SCSI_DEVICE_INFORMATION are merged into the existing WNBD_SCSI_DEVICE struct
use reference counting (via Rundown Protection objects) at the disk device
as well as extension level. Before accessing a device, a reference must be
acquired (handled by WnbdFindDevice* functions), ensuring that it won't be
cleaned up while still being accessed. At the same time, the device reference
count will ensure that all devices are removed before we clean up the
extension.
Rundown Protection objects are highly convenient, providing locking as well.
We can wait for all pending operations to complete, and once we obtain the
lock, no other operations will be allowed.
we'll use one monitor/cleanup thread per device instead of the global one.
which will allow cleaning up devices in parallel. Having all the device
disconnect and cleanup logic in one place makes it easier to follow and less
error prone. It also makes it easier to tell the IRQL used when disconnecting
the device. We can also avoid device cleanup race conditions without having
to use some of the existing (global) locks.
at the moment, there are various way in which we attempt to disconnect or
cleanup a device (e.g. setting WNBD_SCSI_DEVICE.Missing,
WNBD_SCSI_DEVICE.ReportedMissing, PSCSI_DEVICE_INFORMATION.HardTermiateDevice
or calling DisconnectConnection, trigger the device cleanup thread).
that seems overly complicated. this change will allow removing a device
by either signaling the device terminate event or the global device terminate
event (which in turn will remove all devices).
In order to synchronously remove a device, the caller can hold a device
reference, obtain a device cleanup thread reference, signal the device
monitor thread and then wait for it to complete.
When cleaning up all the devices, all we have to do is signal the global
device terminate event and then wait for the device reference count to reach
0.
simplify outstanding IO tracking and move the counter to WNBD_DRV_STATS
simplify SRB status reporting, adding CompleteRequest
simplify socket cleanup helpers and fix incorrect naming (
currently DisconnectConnection closes the socket and CloseConnection
tries to gracefully disconnect it)
avoid using void pointers when not needed, it makes the interface difficult
to follow
Refactor WNBD device handling
At the moment, we have about 4 overlapping (providing the same fields), highly interconnected structures that describe a device, as well as two device lists. Locking isn't ideal either, we're using more locks than we need. Operations such as creating, listing or removing devices are holding exclusive locks more than they should.
At the same time, the device cleanup workflow is convoluted and doesn't guarantee that all the pending requests are completed by the time we cleanup disk devices or the entire extension, which often leads to memory leaks.
This refactoring is meant to greatly simplify device handling, making the driver easier to maintain and also avoid a few memory leaks. Here are the changes that we're going to make:
use reference counting (via Rundown Protection objects) at the disk device as well as extension level. Before accessing a device, a reference must be acquired (handled by WnbdFindDevice* functions), ensuring that it won't be cleaned up while still being accessed. At the same time, the device reference count will ensure that all devices are removed before we clean up the extension.
Rundown Protection objects are highly convenient, providing locking as well. We can wait for all pending operations to complete, and once we obtain the lock, no other operations will be allowed.
at the moment, there are various way in which we attempt to disconnect or cleanup a device (e.g. setting WNBD_SCSI_DEVICE.Missing, WNBD_SCSI_DEVICE.ReportedMissing, PSCSI_DEVICE_INFORMATION.HardTermiateDevice or calling DisconnectConnection, trigger the device cleanup thread). that seems overly complicated. this change will allow removing a device by either signaling the device terminate event or the global device terminate event (which in turn will remove all devices).
In order to synchronously remove a device, the caller can hold a device reference, obtain a device cleanup thread reference, signal the device monitor thread and then wait for it to complete.
When cleaning up all the devices, all we have to do is signal the global device terminate event and then wait for the device reference count to reach 0.