akiradeveloper / dm-writeboost

Log-structured Caching for Linux
GNU General Public License v2.0
120 stars 19 forks source link

Support for 4kn devices #176

Closed thorvald closed 7 years ago

thorvald commented 7 years ago

Our NVMe are usually low-level formatted with 4k physical block sizes. With that, dm-writeboost refuses to use it as a caching device, since it cannot read the superblock -- issuing a 512 byte read to a 4kn device always fails.

[  823.102234] blk_update_request: I/O error, dev nvme0n1, sector 2048
[  823.108517] device-mapper: writeboost: read_superblock_header() I/O error(-5), bits(18446744073709551615), dev(259:3), sector(0), read
[  823.120595] device-mapper: writeboost: read_superblock_header failed
[  823.126949] device-mapper: writeboost: audit_cache_device failed

I tried changing the buf_1 pool to be 4096 byte and modifying read_superblock_header to read 1<<3 sectors, and that allows dm-writeboost to progress further. Encouraged by that, I tried reformatting the NVMe to 512-byte physical blocks, at which point all the errors went away.

Any plans to support 4kn cache devices?

Also, our backing devices are 4kn and don't support changing the block size. My understanding is that dm-writeboost always issues 4k IO, and initial testing looks good, but are there any caveats with 4kn backing devices?

akiradeveloper commented 7 years ago

@thorvald Thanks for reporting. I've investigated the code a bit and found that buf_1_pool is only used to read/write superblock header/recorder.

I tried changing the buf_1 pool to be 4096 byte and modifying read_superblock_header to read 1<<3 sectors, and that allows dm-writeboost to progress further

So this should work but I suppose it fails again once you turns sync_data_interval because it submits bio in 512KB alignment. The workaround modifying the pool to buf_8_pool won't work this time.

Any plans to support 4kn cache devices?

I think it's good idea to support 4KB cache device.

Also, our backing devices are 4kn and don't support changing the block size. My understanding is that dm-writeboost always issues 4k IO, and initial testing looks good, but are there any caveats with 4kn backing devices?

You will be happy if applications (typically filesystem) only submit I/O in 4KB alignment otherwise Writeboost redirects the partial (less than 4KB) read to the backing device.

static int process_read(struct wb_device *wb, struct bio *bio)
{
        struct lookup_result res;
        struct dirtiness dirtiness;
        struct per_bio_data *pbd;

        bool reserved = false;

        mutex_lock(&wb->io_lock);
        cache_lookup(wb, bio, &res);
        if (!res.found)
                reserved = reserve_read_cache_cell(wb, bio);
        mutex_unlock(&wb->io_lock);

        if (!res.found) {
                if (reserved) {
                        /*
                         * Remapping clone bio to the backing store leads to
                         * empty payload in clone_endio().
                         * To avoid caching junk data, we need this workaround
                         * to call dm_io() to certainly fill the bio payload.
                         */
                        if (read_backing_async(wb, bio)) {
                                struct read_backing_async_context ctx = {
                                        .wb = wb,
                                        .bio = bio
                                };
                                read_backing_async_callback_onstack(1, &ctx);
                        }
                        return DM_MAPIO_SUBMITTED;
                } else {
                        bio_remap(bio, wb->backing_dev, bi_sector(bio));
                        return DM_MAPIO_REMAPPED;
                }
        }

So we need to encourage the upper layer to only submit bio that aligns 4KB. But as far as I understand blk_io_limits_xxxs are only hints.

https://people.redhat.com/msnitzer/docs/io-limits.txt

Direct I/O best practices

Users must always take care to use properly aligned and sized IO. This is especially important for Direct I/O access. Direct I/O should be aligned on a 'logical_block_size' boundary and in multiples of the 'logical_block_size'. With native 4K devices (logical_block_size is 4K) it is now critical that applications perform Direct I/O that is a multiple of the device's 'logical_block_size'. This means that applications that do not perform 4K aligned I/O, but 512-byte aligned I/O, will break with native 4K devices. Applications may consult a device's "I/O Limits" to ensure they are using properly aligned and sized I/O. The "I/O Limits" are exposed through both sysfs and block device ioctl interfaces (also see: libblkid).

akiradeveloper commented 7 years ago

@thorvald Hi, If you have spare time, could you test with the for-thorvald branch? https://github.com/akiradeveloper/dm-writeboost/tree/for-thorvald

It's ok to check if you can initialize the device and do some IOs.