This patch implements the NVMe block device driver. It is greatly based on the pull request submitted by Jan Braunwarth (see https://github.com/cloudius-systems/osv/pull/1284) so most credit goes to Jan.
As his PR explains, OSv can be started with an emulated NVMe disk on QEMU like so:
./scripts/run.py --nvme
Compared to Jan's PR, this patch is different in the following ways:
removes all non-NVMe changes (various bug fixes or ioctl enhancements are part of separate PRs)
replaces most of the heap allocations by using stack which should reduce some contention
tweaks PRP-handling code to use lock-less ring buffer which should further reduce contention when allocating memory
fixes a bug in I/O queue CQ handling to correctly determine if SQ is not full
assumes single namespace - 1 (most logic to deal with more has been preserved)
reduces I/O queue size to 64 instead of 256
makes code a little more DRY
Please note that, as Jan points out, the block cache logic of splitting reads and writes into 512-byte requests causes very poor performance when stress testing at devfs level. However, this behavior is not NVMe specific and does not affect most applications that go through a VFS and filesystem driver (ZFS, EXT, ROFS) which use the strategy() method which does not use block cache.
Based on my tests, the NVMe read performance (IOPs and bytes/s) is 60-70% of the virtio-blk on QEMU. I don't know how much that is because of this implementation of the NVMe driver or is it because virtio-blk is by design much faster than anything emulated including NVMe.
This patch implements the NVMe block device driver. It is greatly based on the pull request submitted by Jan Braunwarth (see https://github.com/cloudius-systems/osv/pull/1284) so most credit goes to Jan.
As his PR explains, OSv can be started with an emulated NVMe disk on QEMU like so:
./scripts/run.py --nvme
Compared to Jan's PR, this patch is different in the following ways:
Please note that, as Jan points out, the block cache logic of splitting reads and writes into 512-byte requests causes very poor performance when stress testing at devfs level. However, this behavior is not NVMe specific and does not affect most applications that go through a VFS and filesystem driver (ZFS, EXT, ROFS) which use the strategy() method which does not use block cache.
Based on my tests, the NVMe read performance (IOPs and bytes/s) is 60-70% of the virtio-blk on QEMU. I don't know how much that is because of this implementation of the NVMe driver or is it because virtio-blk is by design much faster than anything emulated including NVMe.
Closes #1203