SoftRoCE / rxe-dev

Development Repository for RXE
Other
130 stars 54 forks source link

kernel Panic #54

Open johnsonyjose opened 8 years ago

johnsonyjose commented 8 years ago

Hi All,

Have two machines with normal NIC adapter on it. On machine acting as NVMe-Host and the other machine NVMe-Target. Target is NULL_BLOCK_DEVICE provided by linux. Discovery/Connect NVMe commands are working fine. Data transfer is happening fine through the Soft-RoCE interface.

When tried running IO's [Read] using fio command, NVMe-Host tries to re-connect to the target and then kernel panic happens. Stack trace shows the error in rdma_disconnect().

Below is the stack trace when panic happened. Sep 16 16:40:44 john kernel: [ 4660.937003] nvme nvme0: rdma_resolve_addr wait failed (-104). Sep 16 16:40:53 john kernel: [ 4669.289136] rxe: set rxe0 active Sep 16 16:40:53 john kernel: [ 4669.289138] rxe: added rxe0 to eno1 Sep 16 16:40:53 john kernel: [ 4669.291500] interface en01 not found Sep 16 16:41:03 john kernel: [ 4679.172136] nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.discovery", addr 192.168.0.154:1023 Sep 16 16:41:05 john kernel: [ 4681.896008] nvme nvme0: creating 4 I/O queues. Sep 16 16:41:05 john kernel: [ 4681.928447] nvme nvme0: new ctrl: NQN "testsubsystem", addr 192.168.0.154:1023 [ 5128.118832] blk_update_request: I/O error, dev nvme0n1, sector 664872 [ 5128.125658] blk_update_request: I/O error, dev nvme0n1, sector 1614312 [ 5128.132569] blk_update_request: I/O error, dev nvme0n1, sector 1309672 [ 5128.139307] blk_update_request: I/O error, dev nvme0n1, sector 1240976 Sep 16 16:48:32 [ 5128.146346] blk_update_request: I/O error, dev nvme0n1, sector 2037616 john kernel: [ 5[ 5128.154293] blk_updaterequest: I/O error, dev nvme0n1, sector 450352 128.118832] blk[ 5128.162782] blk_update_request: I/O error, dev nvme0n1, sector 1719776 update_request: [ 5128.170989] blk_update_request: I/O error, dev nvme0n1, sector 441656 I/O error, dev n[ 5128.178936] blk_update_request: I/O error, dev nvme0n1, sector 668736 vme0n1, sector 6[ 5128.187821] blk_update_request: I/O error, dev nvme0n1, sector 1249384 64872 Sep 16 16:48:32 john kernel: [ 5128.125658] blk_update_request: I/O error, dev nvme0n1, sector 1614312 Sep 16 16:48:32 john kernel: [ 5128.132569] blk_update_request: I/O error, dev nvme0n1, sector 1309672 Sep 16 16:48:32 john kernel: [ 5128.139307] blk_update_request: I/O error, dev nvme0n1, sector 1240976 Sep 16 16:48:32 john kernel: [ 5128.146346] blk_update_request: I/O error, dev nvme0n1, sector 2037616 Sep 16 16:48:32 john kernel: [ 5128.154293] blk_update_request: I/O error, dev nvme0n1, sector 450352 Sep 16 16:48:32 john kernel: [ 5128.162782] blk_update_request: I/O error, dev nvme0n1, sector 1719776 Sep 16 16:48:32 john kernel: [ 5128.170989] blk_update_request: I/O error, dev nvme0n1, sector 441656 Sep 16 16:48:32 john kernel: [ 5128.178936] blk_update_request: I/O error, dev nvme0n1, sector 668736 Sep 16 16:48:32 john kernel: [ 5128.187821] blk_update_request: I/O error, dev nvme0n1, sector 1249384 Sep 16 16:48:32 john kernel: [ 5128.195526] nvme nvme0: reconnecting in 10 seconds [ 5149.206030] nvme nvme0: failed nvme_keep_alive_end_io error=16391 Sep 16 16:48:53 john kernel: [ 5149.206030] nvme nvme0: failed nvme_keep_alive_end_io error=16391 [ 5198.356270] nvme nvme0: Connect command failed, error wo/DNR bit: 7 Sep 16 16:49:42 john kernel: [ 5198.356270] nvme nvme0: Connect command failed, error wo/DNR bit: 7 Sep 16 16:49:42 john kernel: [ 5198.362922] nvme nvme0: Failed reconnect attempt, requeueing... Sep 16 16:49:53 john kernel: [ 5209.619737] nvme nvme0: rdma_resolve_addr wait failed (-110). Sep 16 16:49:53 john kernel: [ 5209.620031] nvme nvme0: Failed reconnect attempt, requeueing... [ 5219.859419] general protection fault: 0000 [#1] SMP [ 5219.864479] Modules linked in: rdma_ucm ib_uverbs nvme_rdma(OE) rdma_cm iw_cm ib_cm configfs nvme_fabrics(OE) nvme_core(OE) rdma_rxe ip6_udp_tunnel udp_tunnel ib_core binfmt_misc snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel intel_powerclamp snd_hda_codec coretemp kvm_intel snd_hda_core kvm snd_hwdep snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq gpio_ich joydev snd_seq_device input_leds snd_timer snd irqbypass mei_me serio_raw mei soundcore lpc_ich mac_hid parport_pc ppdev lp parport autofs4 i915 hid_microsoft hid_generic i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e psmouse usbhid ptp hid drm pps_core pata_acpi fjes video [ 5219.931164] CPU: 3 PID: 4130 Comm: kworker/3:0 Tainted: G OE 4.8.0-rc1+ #1 [ 5219.939458] Hardware name: /DH55TC, BIOS TCIBX10H.86A.0037.2010.0614.1712 06/14/2010 [ 5219.949302] Workqueue: nvme_rdma_wq nvme_rdma_reconnect_ctrl_work [nvme_rdma] [ 5219.956929] task: ffff8d0d2b8b4240 task.stack: ffff8d0d87ab8000 [ 5219.963223] RIP: 0010:[] [] rdma_disconnect+0x2e/0x90 [rdma_cm] [ 5219.972958] RSP: 0018:ffff8d0d87abbdb0 EFLAGS: 00010206 [ 5219.978541] RAX: 6e5f656572745f88 RBX: ffff8d0d34914400 RCX: 0000000000000001 [ 5219.986052] RDX: ffff8d0d34917800 RSI: ffff8d0d35cd8580 RDI: ffff8d0d2b399a00 [ 5219.993504] RBP: ffff8d0d87abbdb8 R08: ffff8d0da34d8c40 R09: 0000000000000002 [ 5220.001116] R10: 0000000000000000 R11: 0000000000003000 R12: ffff8d0d915e9930 [ 5220.008680] R13: ffffe58dffac2600 R14: 00000000000000c0 R15: ffff8d0d915e9930 [ 5220.016211] FS: 0000000000000000(0000) GS:ffff8d0da34c0000(0000) knlGS:0000000000000000 [ 5220.024747] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 5220.030719] CR2: 0000556ef4ce1db8 CR3: 00000000afe06000 CR4: 00000000000006e0 [ 5220.038181] Stack: [ 5220.040308] ffff8d0d914b2400 ffff8d0d87abbdd0 ffffffffc061184e ffff8d0d915e9800 [ 5220.048170] ffff8d0d87abbdf8 ffffffffc0611aef ffff8d0d909b2480 ffff8d0da34d8c40 [ 5220.056159] ffffe58dffac2600 ffff8d0d87abbe38 ffffffff8909eac2 0000000000000000 [ 5220.064048] Call Trace: [ 5220.066635] [] nvme_rdma_stop_and_free_queue+0x1e/0x40 [nvme_rdma] [ 5220.074886] [] nvme_rdma_reconnect_ctrl_work+0x7f/0x1d0 [nvme_rdma] [ 5220.083235] [] process_one_work+0x162/0x4b0 [ 5220.089394] [] worker_thread+0x4b/0x4f0 [ 5220.095199] [] ? process_one_work+0x4b0/0x4b0 [ 5220.101693] [] ? process_one_work+0x4b0/0x4b0 [ 5220.108080] [] kthread+0xf8/0x110 [ 5220.113441] [] ret_from_fork+0x1f/0x40 [ 5220.119170] [] ? kthread_worker_fn+0x1a0/0x1a0 [ 5220.125594] Code: 66 90 55 48 89 e5 53 48 89 fb 48 8b bf 00 03 00 00 48 85 ff 74 65 0f b6 83 b8 01 00 00 48 8b 13 48 c1 e0 04 48 03 82 f8 00 00 00 <8b> 50 08 f6 c2 04 75 14 83 e2 08 b8 ea ff ff ff 74 07 31 f6 e8 [ 5220.146752] RIP [] rdma_disconnect+0x2e/0x90 [rdma_cm] [ 5220.153918] RSP [ 5220.168895] ---[ end trace 4e3fbc3ad0b11617 ]--- [ 5220.168899] Kernel panic - not syncing: Fatal exception

Regards John