acozzette / BUSE

A block device in user space for Linux
GNU General Public License v2.0
240 stars 47 forks source link

Kernel crash in failover testing #16

Open mehulvora83 opened 6 years ago

mehulvora83 commented 6 years ago

Hey,

I ran into kernel crash while testing BUSE failover. Here is the stack dump I see on my Ubuntu-16.04 box.

[65092.911201] ------------[ cut here ]------------ [65092.911209] kernel BUG at /build/linux-0XAgc4/linux-4.4.0/fs/buffer.c:3005! [65092.911212] invalid opcode: 0000 [#1] SMP [65092.911215] Modules linked in: nbd ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter ip_tables xt_conntrack x_tables nf_nat nf_conntrack br_netfilter bridge stp llc aufs binfmt_misc snd_hda_codec_hdmi intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp hp_wmi sparse_keymap kvm_intel kvm snd_hda_codec_realtek input_leds snd_hda_codec_generic irqbypass snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm serio_raw sb_edac edac_core snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer snd lpc_ich mei_me soundcore mei shpchp tpm_infineon 8250_fintek mac_hid ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi [65092.911288] scsi_transport_iscsi parport_pc ppdev lp parport autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel nouveau aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd mxm_wmi video i2c_algo_bit ttm drm_kms_helper e1000e psmouse syscopyarea sysfillrect sysimgblt fb_sys_fops ptp ahci drm pps_core libahci wmi fjes [65092.912379] CPU: 5 PID: 148020 Comm: ls Not tainted 4.4.0-78-generic #99-Ubuntu [65092.913112] Hardware name: Hewlett-Packard HP Z440 Workstation/212B, BIOS M60 v02.31 12/14/2016 [65092.913848] task: ffff8807e7eee200 ti: ffff8807e7690000 task.ti: ffff8807e7690000 [65092.914580] RIP: 0010:[] [] submit_bh_wbc+0x152/0x160 [65092.915311] RSP: 0018:ffff8807e7693af0 EFLAGS: 00010246 [65092.916043] RAX: 0000000000200005 RBX: ffff880034542f08 RCX: 0000000000000000 [65092.916783] RDX: 0000000000000000 RSI: ffff880034542f08 RDI: 0000000000001411 [65092.917515] RBP: ffff8807e7693b18 R08: 0000000000000000 R09: 0000000000000fff [65092.918222] R10: 0000000000000000 R11: 00000000000005d5 R12: 0000000000001411 [65092.918944] R13: 000000000004430c R14: ffff8807e6b9b400 R15: ffff88080352b800 [65092.919649] FS: 00007f9019fb0800(0000) GS:ffff88080c740000(0000) knlGS:0000000000000000 [65092.920365] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [65092.921089] CR2: 0000000001c0afa4 CR3: 00000007e7380000 CR4: 00000000001406e0 [65092.921799] Stack: [65092.922501] ffff880034542f08 0000000000001411 000000000004430c ffff8807e6b9b400 [65092.923226] ffff88080352b800 ffff8807e7693b38 ffffffff812497bc ffffffff81f38d80 [65092.923935] ffff880034542f08 ffff8807e7693b80 ffffffff812bbf42 000000000000115c [65092.924646] Call Trace: [65092.925365] [ < ffffffff812497bc >] sync_dirty_buffer+0x6c/0x100 [65092.926081] [ < ffffffff812bbf42 >] ext4_commit_super+0x1d2/0x290 [65092.926799] [ < ffffffff812bc39b >] ext4_error_inode+0x9b/0x170 [65092.927507] [ < ffffffff812469d2 >] ? wait_on_buffer+0x32/0x40 [65092.928213] [ < ffffffff81293fd9 >] ext4_check_dir_entry+0x109/0x130 [65092.928923] [ < ffffffff812a65c3 >] htree_dirblock_to_tree+0x153/0x270 [65092.929619] [ < ffffffff812a7732 >] ext4_htree_fill_tree+0xb2/0x2e0 [65092.930321] [ < ffffffff811ed392 >] ? kmem_cache_alloc_trace+0x1d2/0x1f0 [65092.931003] [ < ffffffff81294728 >] ext4_readdir+0x728/0xa00 [65092.931689] [ < ffffffff81223122 >] iterate_dir+0x92/0x120 [65092.932361] [ < ffffffff812235c9 >] SyS_getdents+0x99/0x110 [65092.933027] [ < ffffffff812231b0 >] ? iterate_dir+0x120/0x120 [65092.933690] [ < ffffffff81840a32 >] entry_SYSCALL_64_fastpath+0x16/0x71 [65092.934348] Code: 44 89 ef e8 81 14 18 00 5b 31 c0 41 5c 41 5d 41 5e 41 5f 5d c3 40 f6 c7 01 0f 84 1c ff ff ff f0 80 63 01 f7 e9 12 ff ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 0f 0b 0f 1f 40 00 0f 1f 44 00 00 55 31 [65092.935724] RIP [ < ffffffff81247a62 > ] submit_bh_wbc+0x152/0x160 [65092.936399] RSP < ffff8807e7693af0 >

Issue can be reproduced with following commands, modprobe nbd ./busexmp /dev/nbd0 mkfs.ext4 /dev/nbd0 mount /dev/nbd0 /mnt while true; do ls /mnt/; done & kill -9 < pid of busexmp parent process > ./busexmp /dev/nbd0 [[ Restarting busexmp causes the kernel crash ]]

In general, is the failover of busexmp/nbd driver supported? I saw similar crash with nbd-server/client as well, so this crash might not be specific to BUSE.

Thanks, Mehul.