anholt / linux

Other
134 stars 24 forks source link

vc4: refcount API complaint due to misuse in MADV #129

Closed nullr0ute closed 5 years ago

nullr0ute commented 6 years ago

On a Raspberry Pi 3 running Fedora 27 on ARMv7 32 bit I've seen this use after free. Not sure it it's reproducible but will keep an eye out as I test 4.15 more widely.

[  224.202345] alloc_contig_range: 4 callbacks suppressed
[  224.202354] alloc_contig_range: [2c200, 2d200) PFNs busy
[  224.216771] alloc_contig_range: [2c200, 2d300) PFNs busy
[  224.226177] alloc_contig_range: [2c400, 2d400) PFNs busy
[  224.238606] alloc_contig_range: [2c400, 2d500) PFNs busy
[  224.254055] alloc_contig_range: [2c400, 2d600) PFNs busy
[  224.266467] alloc_contig_range: [2c400, 2d700) PFNs busy
[  224.275460] alloc_contig_range: [2c800, 2d800) PFNs busy
[  224.284391] alloc_contig_range: [2c800, 2d900) PFNs busy
[  224.293236] alloc_contig_range: [2c800, 2da00) PFNs busy
[  224.302083] alloc_contig_range: [2c800, 2db00) PFNs busy
[  227.950421] ------------[ cut here ]------------
[  227.955220] WARNING: CPU: 0 PID: 1317 at lib/refcount.c:281 refcount_dec_not_one+0x8c/0xb8
[  227.963696] refcount_t: underflow; use-after-free.
[  227.968631] Modules linked in: vfat fat rfcomm fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables bnep sunrpc rc_cec vc4 snd_soc_core ac97_bus snd_pcm_dmaengine snd_seq snd_seq_device snd_pcm snd_timer snd soundcore cec rc_core drm_kms_helper joydev drm hci_uart brcmfmac btbcm btqca brcmutil btintel bluetooth fb_sys_fops syscopyarea cfg80211 sysfillrect sysimgblt ecdh_generic rfkill bcm2835_thermal
[  228.040690]  bcm2835_wdt bcm2835_rng leds_gpio hid_logitech_hidpp hid_logitech_dj smsc95xx usbnet mii mmc_block dwc2 crc32_arm_ce sdhci_iproc sdhci_pltfm udc_core sdhci bcm2835_dma pwm_bcm2835 i2c_bcm2835 bcm2835 phy_generic
[  228.061023] CPU: 0 PID: 1317 Comm: gnome-shell Not tainted 4.15.2-300.fc27.armv7hl #1
[  228.068966] Hardware name: BCM2835
[  228.072448] [] (unwind_backtrace) from [] (show_stack+0x18/0x1c)
[  228.080315] [] (show_stack) from [] (dump_stack+0x80/0xa0)
[  228.087651] [] (dump_stack) from [] (__warn+0xdc/0xf8)
[  228.094635] [] (__warn) from [] (warn_slowpath_fmt+0x3c/0x4c)
[  228.102240] [] (warn_slowpath_fmt) from [] (refcount_dec_not_one+0x8c/0xb8)
[  228.111163] [] (refcount_dec_not_one) from [] (vc4_bo_dec_usecnt+0x1c/0x78 [vc4])
[  228.120784] [] (vc4_bo_dec_usecnt [vc4]) from [] (drm_atomic_helper_cleanup_planes+0x60/0x68 [drm_kms_helper])
[  228.132902] [] (drm_atomic_helper_cleanup_planes [drm_kms_helper]) from [] (vc4_atomic_complete_commit+0x84/0xc8 [vc4])
[  228.145747] [] (vc4_atomic_complete_commit [vc4]) from [] (vc4_atomic_commit+0x118/0x124 [vc4])
[  228.156528] [] (vc4_atomic_commit [vc4]) from [] (drm_atomic_helper_disable_plane+0xbc/0xc0 [drm_kms_helper])
[  228.168815] [] (drm_atomic_helper_disable_plane [drm_kms_helper]) from [] (__setplane_internal+0x48/0x1e0 [drm])
[  228.181486] [] (__setplane_internal [drm]) from [] (drm_mode_cursor_universal+0x158/0x1bc [drm])
[  228.192717] [] (drm_mode_cursor_universal [drm]) from [] (drm_mode_cursor_common+0xd8/0x1d0 [drm])
[  228.204119] [] (drm_mode_cursor_common [drm]) from [] (drm_ioctl+0x2b8/0x348 [drm])
[  228.213934] [] (drm_ioctl [drm]) from [] (vfs_ioctl+0x28/0x3c)
[  228.221631] [] (vfs_ioctl) from [] (do_vfs_ioctl+0x8c/0x850)
[  228.229145] [] (do_vfs_ioctl) from [] (SyS_ioctl+0x58/0x74)
[  228.236576] [] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x54)
[  228.244318] ---[ end trace 2f6e6444c7159640 ]---
[  290.725119] [drm] Resetting GPU.
[  308.441650] alloc_contig_range: 6 callbacks suppressed
[  308.441661] alloc_contig_range: [2c200, 2d200) PFNs busy
[  308.462519] alloc_contig_range: [2df00, 2ef00) PFNs busy
[  308.469304] alloc_contig_range: [2e000, 2f000) PFNs busy
[  308.476298] alloc_contig_range: [2e000, 2f100) PFNs busy
[  308.483442] alloc_contig_range: [2e000, 2f200) PFNs busy
[  308.490400] alloc_contig_range: [2e000, 2f300) PFNs busy
[  308.499431] alloc_contig_range: [2e400, 2f400) PFNs busy
[  308.506774] alloc_contig_range: [2e400, 2f500) PFNs busy
[  308.514067] alloc_contig_range: [2e600, 2f600) PFNs busy
[  308.521645] alloc_contig_range: [2e700, 2f700) PFNs busy
[  353.633899] alloc_contig_range: 5 callbacks suppressed
[  353.633910] alloc_contig_range: [20c54, 20c55) PFNs busy
[  355.596194] alloc_contig_range: [20ce6, 20ce7) PFNs busy
[  377.041318] alloc_contig_range: [2e600, 2edbc) PFNs busy
[  377.047977] alloc_contig_range: [2e600, 2eebc) PFNs busy
[  377.065662] alloc_contig_range: [2e800, 2efbc) PFNs busy
[  377.072379] alloc_contig_range: [2e800, 2f0bc) PFNs busy
[  377.079176] alloc_contig_range: [2e800, 2f1bc) PFNs busy
[  377.085692] alloc_contig_range: [2e800, 2f2bc) PFNs busy
[  377.091665] alloc_contig_range: [2ec00, 2f3bc) PFNs busy
[  397.019076] [drm:vc4_bo_create [vc4]] *ERROR* Failed to allocate from CMA:
[  397.026153] [drm]                         kernel:   8100kb BOs (1)
[  397.026162] [drm]                            V3D: 196524kb BOs (440)
[  397.026167] [drm]                     V3D shader:    356kb BOs (89)
[  397.026172] [drm]                           dumb:    272kb BOs (17)
[  397.026177] [drm]                            RCL:      8kb BOs (1)
[  397.026182] [drm]                            BCL:     16kb BOs (1)
[  397.026201] vc4_v3d 3fc00000.v3d: Failed to allocate memory for tile binning: -12. You may need to enable CMA or give it more memory.

On the CMA note it's got 256Mb of CMA allocated:

[    0.000000] Linux version 4.15.2-300.1.fc27.armv7hl (mockbuild@buildvm-armv7-05.arm.fedoraproject.org) (gcc version 7.3.1 20180130 (Red Hat 7.3.1-2) (GCC)) #1 SMP Sun Feb 11 15:12:45 UTC 2018

[    0.000000] Kernel command line: ro root=UUID=3293611e-970f-46ae-9b1d-e29eae96e079  cma=192MB cma=256MB LANG=en_GB.UTF-8
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Memory: 715924K/1021952K available (7875K kernel code, 1325K rwdata, 3764K rodata, 2048K init, 520K bss, 43884K reserved, 262144K cma-reserved, 235520K highmem)
[    0.000000] Virtual kernel memory layout:
                   vector  : 0xffff0000 - 0xffff1000   (   4 kB)
                   fixmap  : 0xffc00000 - 0xfff00000   (3072 kB)
                   vmalloc : 0xf0800000 - 0xff800000   ( 240 MB)
                   lowmem  : 0xc0000000 - 0xf0000000   ( 768 MB)
                   pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
                   modules : 0xbf000000 - 0xbfe00000   (  14 MB)
                     .text : 0x(ptrval) - 0x(ptrval)   (8868 kB)
                     .init : 0x(ptrval) - 0x(ptrval)   (2048 kB)
                     .data : 0x(ptrval) - 0x(ptrval)   (1326 kB)
                      .bss : 0x(ptrval) - 0x(ptrval)   ( 521 kB)
nullr0ute commented 6 years ago

Running GNOME Desktop as Wayland

lategoodbye commented 6 years ago

Please report this to Eric Anholt, Boris Brezillon, dri-devel per mail.

anholt commented 6 years ago

There was a discussion about this and the conclusion was that we need to switch back to atomic_t. We lost track of the bug, it seems.

bbrezillon commented 6 years ago

Hm, not sure this is the same issue here. A false positive has been fixed in vc4_bo_inc_usecnt() https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git/commit/drivers/gpu/drm/vc4?h=v4.15.7&id=5bfd40139d55790cbc8e56ad1ce4f974f1fa186d, but maybe this one is a real use after free issue.

I'll have a closer Look.

bbrezillon commented 6 years ago

@nullr0ute, did you find an easy way to reproduce the problem?

I had a look at the code this morning and couldn't find a case where we could hit this problem. Everytime you attach a BO to a plane ->usecnt is incremented and everytime you detach it from the plane it is decremented, so assuming the ->prepare_fb()/->cleanup_fb() are balanced we shouldn't see this kind of issue.

I'll keep digging, but that'd be easier to debug if you have a way to reproduce the bug.

bbrezillon commented 6 years ago

@anholt, looks like the async-plane-update path is not calling drm_atomichelper{prepare,cleanup}_planes() which might explain why we get an inconsistent ->usecnt.

nullr0ute commented 6 years ago

I don't and I've not seen it regularly on 4.16, although I have been traveling so my testing with GUI has been minimal, should be doing more RSN but 4.16/17 focused.

lategoodbye commented 5 years ago

@nullr0ute Is this still reproducible with Fedora 29?

nullr0ute commented 5 years ago

I think we can close it off, I don't remember seeing it, can always re-open.