NetworkBlockDevice / nbd

Network Block Device
GNU General Public License v2.0
450 stars 116 forks source link

FUA documentation issue #108

Closed petrutlucian94 closed 4 years ago

petrutlucian94 commented 4 years ago

The NBD protocol specs state that:

The client SHOULD NOT set this bit unless the command has the potential of writing data (current commands are NBD_CMD_WRITE, NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM), however note that existing clients are known to set this bit on other commands.

Yet as per the SCSI specs, the FUA flag may be used for read requests as well, ensuring that we're not reading unflushed data from a cache.

An FUA bit set to one specifies that the device server shall read the logical blocks from the specified data pattern for that LBA, the non-volatile cache (if any), or the medium. If a volatile cache contains a more recent version of a logical block, then the device server shall write that logical block to non-volatile cache or the medium before reading the logical block.

This can be a bit confusing, perhaps the docs should mention that the FUA flag can be used for read requests as well.

abligh commented 4 years ago

I believe that change would be incorrect (or rather would not be fixing a documentation issue, but be a change in the protocol). FUA was modelled on Linux Kernel's FUA use at the time, rather than on SCSI. At the time I believe FUA only applied to writes.

On 30 Jun 2020, at 17:03, Lucian Petrut notifications@github.com wrote:

The NBD protocol specs state that:

The client SHOULD NOT set this bit unless the command has the potential of writing data (current commands are NBD_CMD_WRITE, NBD_CMD_WRITE_ZEROES and NBD_CMD_TRIM), however note that existing clients are known to set this bit on other commands.

Yet as per the SCSI specs, the FUA flag may be used for read requests as well, ensuring that we're not reading unflushed data from a cache.

An FUA bit set to one specifies that the device server shall read the logical blocks from the specified data pattern for that LBA, the non-volatile cache (if any), or the medium. If a volatile cache contains a more recent version of a logical block, then the device server shall write that logical block to non-volatile cache or the medium before reading the logical block.

This can be a bit confusing, perhaps the docs should mention that the FUA flag can be used for read requests as well.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

-- Alex Bligh

yoe commented 4 years ago

The NBD protocol documentation does not define the SCSI protocol, it defines the NBD protocol. While I can understand the described semantics might make sense for SCSI, they are rather expensive to implement in a user space daemon such as an NBD implementation that makes use of fsync to implement FUA, which is a common way of doing so. Given that, I think implementing such semantics in NBD will create more problems than it solves, and so absent any further info I don't think it's something we need to implement, at all.

If you have a valid use case for this (other than SCSI does X, which is relevant to about the same extent as HTTP does X as far as NBD is concerned), I'm definitely interested in hearing about it; feel free to reopen if that's the case. Otherwise, I'm going to close this issue as I don't think it affects us.

Thanks for the input, at any rate.

petrutlucian94 commented 4 years ago

That makes sense. It's ok if FUA shoudn't be used with NBD reads, it's just that it wasn't clear to me if this was an oversight or something intentional so I was interested in a second opinion. Thanks a lot for clearing it out!

This came up as we were implementing a SCSI interface over NBD, actually a Windows NBD client as a storport miniport driver.

yoe commented 4 years ago

Cool! I had been planning to implement that myself, but just never got around to it. What's the context of this work, and are there plans of making that open source, too?

petrutlucian94 commented 4 years ago

It's open source: https://github.com/cloudbase/wnbd.

We did this while implementing Ceph client side support for Windows. Ceph was already using NBD for RBD, so we decided to go the same route, especially as having an NBD client available would be very useful in other situations as well - for example we're looking at qemu-nbd, allowing a wide range of virtual disk images to be attached to Windows hosts.

We're halfway there merging the Ceph code upstream: https://github.com/ceph/ceph/pull/34859. After we have that covered and after we clean up WNBD a bit and implement a few missing features (e.g. unmap, flush), we're probably going to move WNBD to the Ceph Github project.

yoe commented 4 years ago

I see.

Don't have the time to do some extensive testing right now (definitely plan to look at this in some more detail soon), but one thing that jumps to mind is this: IANA assigned port 10809 to NBD; I think it would make sense to keep that the default port for NBD implementations, unless the user explicitly specifies something different. At least your documentation currently seems to use 9000 as the port number rather than 10809, which seems... suboptimal.

Other than that one nit, this looks awesome :-)