cloudbase / wnbd

Windows Ceph RBD NBD driver
GNU Lesser General Public License v2.1
57 stars 26 forks source link

Detect and remove stale connections #126

Closed petrutlucian94 closed 1 year ago

petrutlucian94 commented 1 year ago

Storport resets the lun after hitting request timeouts. However, it never actually removes the disk. Having a stale disk can be troublesome, leading to an unresponsive host in certain situations (e.g. cache deadlocks, hanging persistent reservation requests, etc).

For this reason, we'll detect stale connections and disconnect the disk. This feature along with the timeouts are configurable. By default, we'll consider a connection to be stale if at least one request older than 15s got aborted and if no IO reply was received in the last minute.

At the same time, we'll include the following timestamps in the wnbd-client.exe stats output:

Signed-off-by: Lucian Petrut lpetrut@cloudbasesolutions.com