datto / dattobd

kernel module for taking block-level snapshots and incremental backups of Linux block devices
GNU General Public License v2.0
566 stars 121 forks source link

Kernel panic with XFS filesystem #184

Closed pomipomi closed 5 years ago

pomipomi commented 5 years ago

Environment: Using latest dattobd running on Ubuntu 18.04 Linux ubuntu 4.15.0-29-generic #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Process to reproduce:

  1. setup snapshot with dbdctl setup-snapshot
  2. transition to incremental
  3. add/edit some files
  4. transition to snapshot, kernel panic sometimes happened

System consist two disks

NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0    7:0    0   2.3M  1 loop /snap/gnome-calculator/238
loop1    7:1    0   2.3M  1 loop /snap/gnome-calculator/180
loop2    7:2    0  14.5M  1 loop /snap/gnome-logs/37
loop3    7:3    0  42.1M  1 loop /snap/gtk-common-themes/701
loop4    7:4    0  34.7M  1 loop /snap/gtk-common-themes/319
loop5    7:5    0   3.7M  1 loop /snap/gnome-system-monitor/57
loop6    7:6    0    13M  1 loop /snap/gnome-characters/103
loop7    7:7    0  14.5M  1 loop /snap/gnome-logs/43
loop8    7:8    0  86.9M  1 loop /snap/core/4917
loop9    7:9    0    13M  1 loop /snap/gnome-characters/124
loop10   7:10   0 140.9M  1 loop /snap/gnome-3-26-1604/70
loop11   7:11   0   3.7M  1 loop /snap/gnome-system-monitor/51
loop12   7:12   0  1008K  1 loop /snap/gnome-logs/61
loop13   7:13   0  53.7M  1 loop /snap/core18/941
loop14   7:14   0  89.4M  1 loop /snap/core/6818
loop15   7:15   0  87.9M  1 loop /snap/core/5548
loop16   7:16   0  14.8M  1 loop /snap/gnome-characters/258
loop17   7:17   0     4M  1 loop /snap/gnome-calculator/406
loop18   7:18   0  35.3M  1 loop /snap/gtk-common-themes/1198
loop19   7:19   0 140.7M  1 loop /snap/gnome-3-26-1604/82
loop20   7:20   0   151M  1 loop /snap/gnome-3-28-1804/40
sda      8:0    0    20G  0 disk
 -sda1   8:1    0    20G  0 part /
sdb      8:16   0    20G  0 disk
 -sdb1   8:17   0     1G  0 part /mnt
sr0     11:0    1  1024M  0 rom
datto0 252:0    0     1G  1 disk
/ is with ext2 /mnt is xfs (using mkfs.xfs with default option) The xfs FS is mounted with default mount option
/dev/sdb1 on /mnt type xfs (rw,relatime,attr2,inode64,noquota)
Kernel panic output is:

[ 1254.988317] general protection fault: 0000 [#1] SMP PTI
[ 1254.988320] Modules linked in: xfs libcrc32c coretemp crct10dif_pclmul crc32_pclmul snd_ens1371 snd_ac97_codec ghash_clmulni_intel gameport ac97_bus sn
d_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi vmw_balloon pcbc snd_seq snd_seq_device snd_timer joydev aesni_intel aes_x86_64 crypto_simd glue_helper
input_leds snd cryptd intel_rapl_perf serio_raw soundcore vmw_vsock_vmci_transport vsock shpchp vmw_vmci mac_hid sch_fq_codel parport_pc ppdev lp parport
ip_tables x_tables autofs4 dattobd(OE) hid_generic usbhid hid psmouse vmwgfx ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm mptspi a
hci e1000 libahci mptscsih mptbase scsi_transport_spi i2c_piix4 pata_acpi
[ 1254.988351] CPU: 0 PID: 1974 Comm: xfsaild/sdb1 Tainted: G           OE    4.15.0-29-generic #31-Ubuntu
[ 1254.988352] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1254.988356] RIP: 0010:tracing_mrf+0x2cb/0x7a0 [dattobd]
[ 1254.988357] RSP: 0018:ffffaaf1c10f7b60 EFLAGS: 00010246
[ 1254.988372] RAX: dead0000ffffffff RBX: 0000000000000200 RCX: 0000000000000000
[ 1254.988373] RDX: 0000000000000000 RSI: dead000000000400 RDI: 0000000000000200
[ 1254.988374] RBP: ffffaaf1c10f7bc0 R08: 0000000000000000 R09: ffff9d9769aa3a00
[ 1254.988375] R10: ffff9d9769aa3a88 R11: 0000000000000000 R12: 0000000000000802
[ 1254.988375] R13: 0000000000000200 R14: 0000000000000000 R15: ffff9d9771047c00
[ 1254.988377] FS:  0000000000000000(0000) GS:ffff9d9779600000(0000) knlGS:0000000000000000
[ 1254.988378] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1254.988379] CR2: 000000c4204b7000 CR3: 000000002ba0a003 CR4: 00000000003606f0
[ 1254.988415] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1254.988416] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1254.988417] Call Trace:
[ 1254.988421]  ? mempool_alloc_slab+0x15/0x20
[ 1254.988424]  ? wait_woken+0x80/0x80
[ 1254.988427]  generic_make_request+0x124/0x300
[ 1254.988428]  submit_bio+0x73/0x150
[ 1254.988430]  ? submit_bio+0x73/0x150
[ 1254.988453]  _xfs_buf_ioapply+0x31e/0x4e0 [xfs]
[ 1254.988468]  ? xfs_buf_submit+0x76/0x200 [xfs]
[ 1254.988481]  ? xfs_buf_delwri_submit_buffers+0xfa/0x290 [xfs]
[ 1254.988492]  xfs_buf_submit+0x65/0x200 [xfs]
[ 1254.988503]  ? xfs_buf_submit+0x65/0x200 [xfs]
[ 1254.988514]  xfs_buf_delwri_submit_buffers+0xfa/0x290 [xfs]
[ 1254.988526]  ? xfs_buf_delwri_submit_nowait+0x10/0x20 [xfs]
[ 1254.988537]  xfs_buf_delwri_submit_nowait+0x10/0x20 [xfs]
[ 1254.988548]  ? xfs_buf_delwri_submit_nowait+0x10/0x20 [xfs]
[ 1254.988564]  xfsaild+0x378/0x7b0 [xfs]
[ 1254.988567]  ? __schedule+0x299/0x8a0
[ 1254.988571]  kthread+0x121/0x140
[ 1254.988585]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 1254.988587]  ? kthread+0x121/0x140
[ 1254.988601]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 1254.988603]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 1254.988605]  ret_from_fork+0x35/0x40
[ 1254.988607] Code: 72 20 48 89 c2 40 f6 c6 01 0f 85 91 01 00 00 48 be 00 04 00 00 00 00 ad de 48 39 f0 0f 84 6c 01 00 00 83 e2 01 0f 85 63 01 00 00 <48> 8b 00 49 39 47 68 0f 84 fe 00 00 00 45 85 c0 44 89 e8 75 0a
[ 1254.988629] RIP: tracing_mrf+0x2cb/0x7a0 [dattobd] RSP: ffffaaf1c10f7b60
[ 1254.988631] ---[ end trace 864614b6dcc729dd ]---
[ 1254.988635] WARNING: CPU: 0 PID: 1974 at /build/linux-60XibS/linux-4.15.0/kernel/exit.c:771 do_exit+0x51/0xb40
[ 1254.988636] Modules linked in: xfs libcrc32c coretemp crct10dif_pclmul crc32_pclmul snd_ens1371 snd_ac97_codec ghash_clmulni_intel gameport ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi vmw_balloon pcbc snd_seq snd_seq_device snd_timer joydev aesni_intel aes_x86_64 crypto_simd glue_helper input_leds snd cryptd intel_rapl_perf serio_raw soundcore vmw_vsock_vmci_transport vsock shpchp vmw_vmci mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 dattobd(OE) hid_generic usbhid hid psmouse vmwgfx ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm mptspi ahci e1000 libahci mptscsih mptbase scsi_transport_spi i2c_piix4 pata_acpi
[ 1254.988658] CPU: 0 PID: 1974 Comm: xfsaild/sdb1 Tainted: G      D    OE    4.15.0-29-generic #31-Ubuntu
[ 1254.988659] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 1254.988661] RIP: 0010:do_exit+0x51/0xb40
[ 1254.988662] RSP: 0018:ffffaaf1c10f7ee0 EFLAGS: 00010202
[ 1254.988663] RAX: ffffaaf1c10f7db0 RBX: ffff9d96f3fe8000 RCX: 0000000000000000
[ 1254.988664] RDX: ffff9d96db1afc00 RSI: 0000000000000000 RDI: ffffffffbb4e28c0
[ 1254.988665] RBP: ffffaaf1c10f7f48 R08: 000000005b2d2d2d R09: 000000000000068b
[ 1254.988666] R10: ffffffffbb406a80 R11: 61727420646e6520 R12: 000000000000000b
[ 1254.988667] R13: ffffaaf1c10f7ab8 R14: 0000000000000000 R15: 0000000000000000
[ 1254.988668] FS:  0000000000000000(0000) GS:ffff9d9779600000(0000) knlGS:0000000000000000
[ 1254.988669] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1254.988670] CR2: 000000c4204b7000 CR3: 000000002ba0a003 CR4: 00000000003606f0
[ 1254.988694] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1254.988696] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 1254.988696] Call Trace:
[ 1254.988699]  ? kthread+0x121/0x140
[ 1254.988713]  ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[ 1254.988715]  ? kthread+0x121/0x140
[ 1254.988717]  rewind_stack_do_exit+0x17/0x20
[ 1254.988719] Code: 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 27 56 07 00 48 8b 83 68 0b 00 00 48 85 c0 74 0e 48 8b 10 48 39 d0 0f 84 88 04 00 00 <0f> 0b 65 8b 05 b6 52 f8 45 25 00 ff 1f 00 89 45 9c 0f 85 61 08
[ 1254.988738] ---[ end trace 864614b6dcc729de ]---
crawfxrd commented 5 years ago

To confirm, you are using dattobd 0.10.9?

Can you get the line for the RIP?

gdb /path/to/dattobd.ko
(gdb) list *(tracing_mrf+0x2cb)
pomipomi commented 5 years ago

The file "dattobd.ko" seems located in "/usr/src/dattobd-0.10.9" (no such files in /path/to), so i use

gdb /usr/src/dattobd-0.10.9/dattobd.ko
Here is the line for RIP
root@ubuntu:/# gdb /usr/src/dattobd-0.10.9/dattobd.ko
GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/src/dattobd-0.10.9/dattobd.ko...done.
(gdb) list *(tracing_mrf+0x2cb)
0x321b is in tracing_mrf (/usr/src/dattobd-0.10.9/dattobd.c:2385).
2380            if(PageAnon(pg)) return NULL;
2381    //#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0)
2382    #ifdef TAIL_MAPPING
2383            if (pg->mapping == TAIL_MAPPING) return NULL;
2384    #endif
2385            if(!pg->mapping->host) return NULL;
2386            return pg->mapping->host;
2387    }
2388
2389    static int bio_needs_cow(struct bio *bio, struct inode *inode){
And here is my page_get_inode function in /usr/src/dattobd-0.10.9/dattobd.c:
static inline struct inode *page_get_inode(struct page *pg){
        if(!pg) return NULL;
        if(!pg->mapping) return NULL;
        if(PageAnon(pg)) return NULL;
//#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,5,0)
#ifdef TAIL_MAPPING
        if (pg->mapping == TAIL_MAPPING) return NULL;
#endif
        if(!pg->mapping->host) return NULL;
        return pg->mapping->host;
}
crawfxrd commented 5 years ago

This appears to be the same thing as #40.

Try running:

grep -r "TAIL_MAPPING" /usr/src/linux-headers-4.15.0-29/

It should output:

/usr/src/linux-headers-4.15.0-29/include/linux/poison.h:#define TAIL_MAPPING    ((void *) 0x400 + POISON_POINTER_DELTA)

If it does, then there is something wrong with the build of dattobd. In that case, I suggest rebuilding/reinstalling dattobd and trying again.

pomipomi commented 5 years ago

After using

grep -r "TAIL_MAPPING" /usr/src/linux-headers-4.15.0-29/
I get the same message as you said:
/usr/src/linux-headers-4.15.0-29/include/linux/poison.h:#define TAIL_MAPPING    ((void *) 0x400 + POISON_POINTER_DELTA)
Does it means my dattobd need to rebuild/reinstall ? I've tried to rebuild it and test dbdctl again, but kernel panic still happened. Could I also ask that what's the correct output I should get after I grep TAIL_MAPPING ? Some detailed information might help: 1. The environment I used previously was Ubuntu 18.04 with 4.15.0-29, running on vmware. 2. I reinstall an Ubuntu 18.04 with 4.15.0-50 VM, but the same panic problem happened. 3. I Install Ubuntu 18.04 with 4.15.0-50 on my personal computer(not VM) and use dbdctl several times, but kernel panic seems to disappear this time.... 4. All three devices above get the same output after grepping "TAIL_MAPPING"
crawfxrd commented 5 years ago

I was able to reproduce this on 4.15.

May 21 08:07:13 segv-u1804 kernel: TAIL_MAPPING = dead000000000400
May 21 08:07:13 segv-u1804 kernel: pg->mapping  = dead0000ffffffff

I don't know where this value comes from. It seems bogus (-1 cast to UL?).


I have not seen the issue on 4.18.0-20-generic. This kernel produces the expected value:

May 21 08:16:05 segv-u1804 kernel: TAIL_MAPPING = dead000000000400
May 21 08:16:05 segv-u1804 kernel: pg->mapping  = dead000000000400
pomipomi commented 5 years ago

Just for making sure, what's kind of device did you use to reproduce the question? (PC / VM?)

Maybe skip the address over TAIL_MAPPING or some specific value can deal with this case?

crawfxrd commented 5 years ago

I am using a qemu VM.

@tcaputi any idea on this one?

tcaputi commented 5 years ago

I don't really have a ton of time to look at this now, but this seems to be the same issue that effected the crash utility: https://www.redhat.com/archives/crash-utility/2016-February/msg00010.html

We should probably just find the commit that crash made and do whatever they did.

crawfxrd commented 5 years ago

Looks like using the head (a la pg = compound_head(pg)) instead of checking for a tail would work.

Of interest is page_mapping().

crawfxrd commented 5 years ago

@pomipomi could you try #187?

pomipomi commented 5 years ago

I've used a script to test the patch, looks like the kernel panic problem is solved! (Testing process: add and change files -> transition to incremental -> add and change files -> transition to snapshot)

Thanks for your help!