OpenChannelSSD / qemu-nvme

The LightNVM qemu implementation, based on NVMe
http://openchannelssd.readthedocs.org/en/latest/
Other
131 stars 67 forks source link

Implement support for DLFEAT/DULBE #43

Closed birkelund closed 5 years ago

birkelund commented 6 years ago

Hi Matias,

Please consider reviewing and pulling my branch which implements support for DLFEAT/DULBE for both the scalar and vector paths.

It also introduces checks that enforce the read/write access rules (MW_CUNITS, WS_MIN).

MatiasBjorling commented 6 years ago

pblk falls over when I try to test this.

[ 15.000307] pblk pblk0: ppa: (seq: 1):ch:0,lun:1,chk:0,sec:309 [ 15.000855] pblk pblk0: corrupted read LBA (0/1221) [ 15.001322] WARNING: CPU: 2 PID: 0 at drivers/lightnvm/pblk-read.c:125 pblk_end_io_read+0x19c/0x220 [ 15.002187] Modules linked in: [ 15.002483] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 4.18.0-rc4+ #1033 [ 15.003238] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 15.004443] RIP: 0010:pblk_end_io_read+0x19c/0x220 [ 15.004915] Code: ca ff 58 5a 4c 8b 5d b0 4c 8b 55 b8 49 8b 46 08 4c 89 d1 4c 89 ea 48 c7 c7 40 f6 df 81 4c 89 5d b8 48 8d 70 0c e8 e7 63 ca ff <0f> 0b 4c 8b 5d b8 e9 e8 fe ff ff 0f b6 50 06 c0 ea 04 0f b6 d2 52 [ 15.006688] RSP: 0018:ffff88013fd03d70 EFLAGS: 00010082 [ 15.007181] RAX: 0000000000000027 RBX: 0000000000000001 RCX: 0000000000000006 [ 15.007853] RDX: 0000000000000000 RSI: 0000000000000086 RDI: ffff88013fd15090 [ 15.008522] RBP: ffff88013fd03dc0 R08: 0000000000000000 R09: 000000000000033b [ 15.009191] R10: 00000000000004c5 R11: ffffffff8224f80d R12: ffff880139cc6018 [ 15.009859] R13: 0000000000000000 R14: ffff88013b2ae800 R15: 0000000000000004 [ 15.010531] FS: 0000000000000000(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000 [ 15.011293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.011832] CR2: 00007fe6d26c11d8 CR3: 0000000002009005 CR4: 00000000003606a0 [ 15.012497] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 15.013161] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 15.013824] Call Trace: [ 15.014059] [ 15.014255] pblk_end_io_read+0x39/0x60 [ 15.014618] nvm_end_io+0x3a/0x40 [ 15.014937] nvme_nvm_end_io+0x2e/0x50 [ 15.015291] blk_mq_end_request+0x7d/0xd0 [ 15.015670] nvme_complete_rq+0x74/0x190 [ 15.016040] nvme_pci_complete_rq+0x7d/0x130 [ 15.016443] blk_mq_complete_request+0x9f/0x100 [ 15.016868] nvme_irq+0x131/0x1c0 [ 15.017184] __handle_irq_event_percpu+0x3f/0x180 [ 15.017626] handle_irq_event_percpu+0x32/0x80 [ 15.018043] handle_irq_event+0x2c/0x50 [ 15.018405] handle_edge_irq+0x69/0x190 [ 15.018770] handle_irq+0x1a/0x30 [ 15.019084] do_IRQ+0x46/0xd0 [ 15.019369] common_interrupt+0xf/0xf [ 15.019715] [ 15.019920] RIP: 0010:default_idle+0x1d/0xd0 [ 15.020323] Code: eb ae e8 a6 32 a5 ff 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 65 44 8b 25 5b cf 9e 7e 53 0f 1f 44 00 00 fb f4 <65> 44 8b 25 4b cf 9e 7e 0f 1f 44 00 00 5b 41 5c 41 5d 5d c3 65 8b [ 15.022070] RSP: 0018:ffffc9000007fe78 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde [ 15.022776] RAX: 0000000080000000 RBX: 0000000000000002 RCX: ffff88013fd19120 [ 15.023440] RDX: ffffffff8203afe8 RSI: ffff88013fd19120 RDI: 0000000000000002 [ 15.024103] RBP: ffffc9000007fe90 R08: ffff88013fd1b680 R09: 0000000000000000 [ 15.024767] R10: ffffc9000007fe78 R11: 0000000000000001 R12: 0000000000000002 [ 15.025429] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 15.026093] ? sched_clock_cpu+0x11/0xb0 [ 15.026463] arch_cpu_idle+0xf/0x20 [ 15.026796] default_idle_call+0x23/0x30 [ 15.027166] do_idle+0x1d9/0x200 [ 15.027473] cpu_startup_entry+0x73/0x80 [ 15.027844] start_secondary+0x178/0x1c0 [ 15.028214] secondary_startup_64+0xa5/0xb0 [ 15.028608] ---[ end trace b9258e4df8cf4f57 ]--- [ 15.029040] pblk pblk0: ppa: (seq: 2):ch:0,lun:1,chk:0,sec:310 [ 15.029584] pblk pblk0: corrupted read LBA (0/1222) [ 15.030045] WARNING: CPU: 2 PID: 0 at drivers/lightnvm/pblk-read.c:125 pblk_end_io_read+0x19c/0x220 [ 15.030904] Modules linked in: [ 15.031197] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G W 4.18.0-rc4+ #1033 [ 15.031943] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 [ 15.033135] RIP: 0010:pblk_end_io_read+0x19c/0x220 [ 15.033603] Code: ca ff 58 5a 4c 8b 5d b0 4c 8b 55 b8 49 8b 46 08 4c 89 d1 4c 89 ea 48 c7 c7 40 f6 df 81 4c 89 5d b8 48 8d 70 0c e8 e7 63 ca ff <0f> 0b 4c 8b 5d b8 e9 e8 fe ff ff 0f b6 50 06 c0 ea 04 0f b6 d2 52

MatiasBjorling commented 6 years ago

Qemu device config

-device nvme,drive=mynvme,serial=deadbeef,lnum_pu=4,lstrict=1,meta=16,mc=2,lsecs_per_chk=4096,lws_min=4,lws_opt=8 \

Tested with for-4.20/core

MatiasBjorling commented 6 years ago

Looks like error reporting always is activated. Is there some way to default this to disabled, and have an option to enable it using the qemu device string?

birkelund commented 6 years ago

I actually have not tested with pblk, just nvme-cli and liblightnvm. But I'll look into it.

DULBE is off by default. It needs to be explicitly activated using e.g.

nvme set-feature /dev/nvme0 -n 1 -f 0x5 -v $(( 1 << 16 ))

MatiasBjorling commented 6 years ago

You're right. I had these in mind:

nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207 nvme: accessing unmapped chunk: ch:0, lun:3, chk:63, cno: 207

Which was reported on partition scan.

birkelund commented 6 years ago

Yeah, the hw will report that, but still return success and 0x00 to the host.

And yeah. I also seeing some corrupted reads for pblk. I'll look into it and report back.

birkelund commented 6 years ago

OK, I can now successfully run a fio verify test on pblk.

The issue was in function lnvm_rw_check_chunk_read, where I check for the read being valid with respect to MW_CUNITS. According to the spec a read is only valid if READ < (WP LBA - MW_CUNITS), causing me to implement that check as:

if (state == LNVM_CHUNK_OPEN || state == LNVM_CHUNK_CLOSED) {
    uint64_t wpp = wp < mw_cunits ? 0 : wp;
    if (state == LNVM_CHUNK_OPEN && wpp) {
        wpp = wp-mw_cunits;
    }

    if (end_sectr < wpp) {
        return 0;
    }
}

pblk assumes that it is allowed to read at the end_sectr, so the condition becomes end_sectr <= wpp instead. This might be something that needs to go into an errata for the spec? It was a bit unclear to me.

MatiasBjorling commented 6 years ago

Uh, so pblk thinks that it can read from the WP offset? It should only read up till the write pointer.

javigon commented 6 years ago

It has nothing to do with pblk. pblk reads WP - MW_CUNITS in the chunk as originally defined. The way it is worded in the spec creates confusion, and probably one of the formulas for the read access should be <=

MatiasBjorling commented 6 years ago

Here is an example:

MW_CUNITS = 16 Chunk size = 64 (0...63)

Chunk State: Open WP = 32

In that case,

WP - MW_CUNITS = 16

Valid reads from 0-15. That equals to the bound checking for < 16, and not <= 16.

Maybe "Read between (WP - MW_CUNITS) LBA and WP LBA" could be reworded to make it explicit that it includes WP - MW_CUNITS.

javigon commented 6 years ago

That is my understanding too. It is true though that the spec is not clear about this point, as it can be interpreted 0-indexed.

We are double checking pblk, as Klaus' test look ok.

MatiasBjorling commented 6 years ago

Hehe, it is not the first time the zero-based value has caused questions.

javigon commented 6 years ago

There is a corner case in which pblk does not respect mw_cunits. I just submitted a patch to fix this upstream.

birkelund commented 6 years ago

I've tested @javigon's patches on my branch. pblk is happy now.

birkelund commented 5 years ago

Closing in favor of other.