SymbioticLab / Infiniswap

Infiniswap enables unmodified applications to efficiently use disaggregated memory.
239 stars 49 forks source link

NULL bio structure #14

Open leeymcj opened 6 years ago

leeymcj commented 6 years ago

New kernel (>4.x) generates requests with NULL bio Stackbd tries to clone it but panicking with NULL pointer exception.

blakecaldwell commented 5 years ago

Thanks for noting this. I think I have run into the same issue, but with kernel 3.17. Do you have a fix that handles this?

blakecaldwell commented 5 years ago

This appears to be reproducible with 3.13 even (centos 7.5, kernel 3.13.0, MLNX OFED 3.4).

Below is the kernel oops right after swapon /dev/infiniswap0. Sometimes this error doesn't happen until the swap device is being used, but always very quickly after 10-100MB have been swapped out.

[ 728.619403] In IS_session_create() with portal: rdma://1,10.10.10.4:9400, [ 728.626210] rdma://1,10.10.10.4:9400, [ 728.629876] portal: 10.10.10.4, 9400 [ 733.985157] IS_register_block_device, dev_name infiniswap0 [ 733.990649] IS: init done [ 733.993432] stackbd: init done [ 733.996501] Opened /dev/loop0 [ 733.999476] stackbd: Device real capacity: 104857600 [ 734.004440] stackbd: Max sectors: 8 [ 734.007965] stackbd: done initializing successfully [ 748.056376] evict_handler, waiting for STOP msg [ 752.832599] BUG: unable to handle kernel NULL pointer dereference at 0000000000000068 [ 752.840455] IP: [] bio_clone_bioset+0x11/0x70 [ 752.846388] PGD 7b6675067 PUD 78baba067 PMD 0 [ 752.850882] Oops: 0000 [#1] PREEMPT SMP [ 752.854855] Modules linked in: infiniswap(OF) xt_CT xt_mac xt_comment xt_physdev xt_set ip_set_hash_net ip_set iptable_raw xt_CHECKSUM iptable_mangle ipt_REJECT rbd libceph ebtable_filter ebtables xt_nat xt_tcpudp openvswitch gre ipt_MASQUERADE nf_conntrack_netlink nf netlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack veth rdma_ucm(OF) ib_ucm(OF) rdma_cm(OF) iw_cm(OF) configfs ib_ipoib(OF) ib_cm(OF) ib_uverbs(OF) ib_umad(OF) mlx5ib(OF) mlx5 core(OF) iTCO_wdt gpio_ich iTCO_vendor_support mlx4_en(OF) mlx4_ib(OF) ib_sa(OF) ib_mad(OF) ib_core(OF) ib_addr(OF) ib_netlink(OF) ipmi_devintf coretemp x86_pkg_temp_thermal kvm_intel kvm dm_thin_pool dm_persistent_data crct10dif_pclmul crc32_pclmul dm_bufio ghash_clmuln i_intel dm_bio_prison aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd vxlan ip_tunnel lpc_ich i2c_i801 pcspkr mfd_core shpchp joydev wmi ipmi_si ipmi_msghandler acpi_power_meter evbug mlx4_core(OF) mlx_compat(OF) ip_tables x_tables nfsv3 nfs_acl nfs lo ckd fscache bridge stp llc hid_generic igb usbkbd usbmouse usbhid hid i2c_algo_bit dca hwmon ahci ptp libahci pps_core sunrpc ipv6 autofs4 [ 752.963065] CPU: 4 PID: 104 Comm: kswapd0 Tainted: GF O 3.13.0-scaleos #1 [ 752.970627] Hardware name: Supermicro SYS-1028TR-TF/X10DRT-LIBF, BIOS 2.0 12/17/2015 [ 752.978365] task: ffff88105b9a1a10 ti: ffff88085be54000 task.ti: ffff88085be54000 [ 752.985842] RIP: 0010:[] [] bio_clone_bioset+0x11/0x70 [ 752.994201] RSP: 0018:ffff88085be55578 EFLAGS: 00010246 [ 752.999511] RAX: ffff88104fa4b000 RBX: ffff88104fb9d728 RCX: 0000000000000000 [ 753.006639] RDX: ffff88085c143800 RSI: 0000000000000020 RDI: 0000000000000000 [ 753.013769] RBP: ffff88085be55590 R08: ffff8807a68beee0 R09: 0000000017027000 [ 753.020898] R10: ffffc90020192480 R11: 00000000000141c0 R12: 0000000000000000 [ 753.028028] R13: 0000000000000020 R14: ffff88104fb9d728 R15: ffff880859f9ca80 [ 753.035159] FS: 0000000000000000(0000) GS:ffff88085fd00000(0000) knlGS:0000000000000000 [ 753.043242] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 753.048983] CR2: 0000000000000068 CR3: 00000007b9c08000 CR4: 00000000003407e0 [ 753.056113] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 753.063243] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 753.070372] Stack: [ 753.072383] ffff88104fb9d728 0000000000000000 0000000000000000 ffff88085be555b0 [ 753.079845] ffffffffa06ce93c 0000000000000000 ffff88104fb9d738 ffff88085be55618 [ 753.087304] ffffffffa06ceb9b ffff880074989000 0000000100000000 0000000017027000 [ 753.094763] Call Trace: [ 753.097213] [] stackbd_bio_generate+0x2c/0xa0 [infiniswap] [ 753.104349] [] IS_rdma_write+0x14b/0x1f0 [infiniswap] [ 753.111044] [] IS_transfer_chunk+0x74/0xd0 [infiniswap] [ 753.117915] [] IS_queue_rq+0x224/0x390 [infiniswap] [ 753.124438] [] __blk_mq_run_hw_queue+0x1c3/0x3f0 [ 753.130700] [] blk_mq_run_hw_queue+0x35/0x40 [ 753.136613] [] blk_mq_insert_requests+0xba/0x140 [ 753.142877] [] blk_mq_flush_plug_list+0x129/0x140 [ 753.149227] [] blk_flush_plug_list+0xd9/0x230 [ 753.155228] [] blk_mq_make_request+0x37a/0x4e0 [ 753.161316] [] generic_make_request+0xc2/0x110 [ 753.167403] [] submit_bio+0x71/0x150 [ 753.172628] [] ? test_set_page_writeback+0x115/0x180 [ 753.179235] [] __swap_writepage+0x164/0x210 [ 753.185066] [] ? _raw_spin_lock+0x17/0x60 [ 753.190719] [] ? _raw_spin_unlock+0x1c/0x60 [ 753.196548] [] ? page_swapcount+0x4c/0x60 [ 753.202202] [] swap_writepage+0x39/0x70 [ 753.207685] [] shmem_writepage+0x198/0x2d0 [ 753.213427] [] shrink_page_list+0x47b/0x9f0 [ 753.219255] [] shrink_inactive_list+0x228/0x4c0 [ 753.225432] [] shrink_lruvec+0x4d1/0x650 [ 753.230999] [] shrink_zone+0x31/0x100 [ 753.236308] [] balance_pgdat+0x386/0x5b0 [ 753.241874] [] kswapd+0x156/0x440 [ 753.246837] [] ? prepare_to_wait_event+0x100/0x100 [ 753.253274] [] ? balance_pgdat+0x5b0/0x5b0 [ 753.259016] [] kthread+0xc9/0xe0 [ 753.263888] [] ? kthread_create_on_node+0x190/0x190 [ 753.270412] [] ret_from_fork+0x7c/0xb0 [ 753.275806] [] ? kthread_create_on_node+0x190/0x190 [ 753.282325] Code: 41 5c 5d c3 66 0f 1f 44 00 00 48 89 df e8 e8 62 fb ff eb 92 0f 0b 0f 1f 40 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 89 f5 41 54 53 <8b> 77 68 48 89 fb 44 89 ef e8 21 fa ff ff 48 85 c0 49 89 c4 74 [ 753.302303] RIP [] bio_clone_bioset+0x11/0x70 [ 753.308338] RSP [ 753.311824] CR2: 0000000000000068

If I check for (req->bio == NULL) before calling bio_clone(), I get a hit, but kernel module still fails because the request is not handled.