anholt / linux

Other
134 stars 24 forks source link

VC4 kernel framework driver - memory leak - 2 #123

Closed nkichukov closed 6 years ago

nkichukov commented 6 years ago

Hi all,

I have found a second vc4 memory leak on raspberry pi 3b after https://github.com/anholt/linux/issues/122 got fixed.

Details follow below:

kernel: 4.14.0-v7+ compiler: gcc-6.4 raspberry pi running in 32bit mode CMA=256MB OS: Gentoo Linux Kernel configuration is attached, see below.

Whenever kodi(v17.6) is playing video, I can see that /proc/slabinfo reports number of kmalloc-128 constantly increasing and never releasing: kmalloc-128 51027 58863 384 21 2 : tunables 0 0 0 : slabdata 2803 2803 0

The memleak pattern is like the one below (from /sys/kernel/debug/kmemleak):

unreferenced object 0x842aa340 (size 128): comm "X", pid 2009, jiffies 7034334 (age 301.527s) hex dump (first 32 bytes): 50 3f eb b6 01 00 00 00 00 00 00 00 00 00 00 00 P?.............. 50 a3 2a 84 50 a3 2a 84 00 00 00 00 00 00 00 00 P..P.......... backtrace: [<802827c8>] kmem_cache_alloc_trace+0x234/0x2f8 [<7f167ae8>] drm_atomic_helper_setup_commit+0x1cc/0x3b0 [drm_kms_helper] [<7f1e2aa0>] vc4_atomic_commit+0x30/0x130 [vc4] [<7f0f44a0>] drm_atomic_commit+0x5c/0x68 [drm] [<7f0f5700>] drm_atomic_connector_commit_dpms+0xf8/0x108 [drm] [<7f0fae64>] drm_mode_obj_set_property_ioctl+0x1bc/0x2b4 [drm] [<7f0f9820>] drm_mode_connector_property_set_ioctl+0x48/0x50 [drm] [<7f0e296c>] drm_ioctl_kernel+0x78/0xb8 [drm] [<7f0e2cd4>] drm_ioctl+0x1b4/0x37c [drm] [<802ac090>] do_vfs_ioctl+0xb0/0x8b4 [<802ac8d8>] SyS_ioctl+0x44/0x6c [<80108140>] ret_fast_syscall+0x0/0x28 [] 0xffffffff unreferenced object 0x83ee2340 (size 128): comm "X", pid 2009, jiffies 7044631 (age 291.296s) hex dump (first 32 bytes): 50 3f eb b6 01 00 00 00 00 00 00 00 00 00 00 00 P?.............. 50 23 ee 83 50 23 ee 83 00 00 00 00 00 00 00 00 P#..P#.......... backtrace: [<802827c8>] kmem_cache_alloc_trace+0x234/0x2f8 [<7f167ae8>] drm_atomic_helper_setup_commit+0x1cc/0x3b0 [drm_kms_helper] [<7f1e2aa0>] vc4_atomic_commit+0x30/0x130 [vc4] [<7f0f44a0>] drm_atomic_commit+0x5c/0x68 [drm] [<7f0f5700>] drm_atomic_connector_commit_dpms+0xf8/0x108 [drm] [<7f0fae64>] drm_mode_obj_set_property_ioctl+0x1bc/0x2b4 [drm] [<7f0f9820>] drm_mode_connector_property_set_ioctl+0x48/0x50 [drm] [<7f0e296c>] drm_ioctl_kernel+0x78/0xb8 [drm] [<7f0e2cd4>] drm_ioctl+0x1b4/0x37c [drm] [<802ac090>] do_vfs_ioctl+0xb0/0x8b4 [<802ac8d8>] SyS_ioctl+0x44/0x6c [<80108140>] ret_fast_syscall+0x0/0x28 [] 0xffffffff unreferenced object 0x84bbe040 (size 128): comm "X", pid 2009, jiffies 7049751 (age 286.242s) hex dump (first 32 bytes): 50 3f eb b6 01 00 00 00 00 00 00 00 00 00 00 00 P?.............. 50 e0 bb 84 50 e0 bb 84 00 00 00 00 00 00 00 00 P...P........... backtrace: [<802827c8>] kmem_cache_alloc_trace+0x234/0x2f8 [<7f167ae8>] drm_atomic_helper_setup_commit+0x1cc/0x3b0 [drm_kms_helper] [<7f1e2aa0>] vc4_atomic_commit+0x30/0x130 [vc4] [<7f0f44a0>] drm_atomic_commit+0x5c/0x68 [drm] [<7f0f5700>] drm_atomic_connector_commit_dpms+0xf8/0x108 [drm] [<7f0fae64>] drm_mode_obj_set_property_ioctl+0x1bc/0x2b4 [drm] [<7f0f9820>] drm_mode_connector_property_set_ioctl+0x48/0x50 [drm] [<7f0e296c>] drm_ioctl_kernel+0x78/0xb8 [drm] [<7f0e2cd4>] drm_ioctl+0x1b4/0x37c [drm] [<802ac090>] do_vfs_ioctl+0xb0/0x8b4 [<802ac8d8>] SyS_ioctl+0x44/0x6c [<80108140>] ret_fast_syscall+0x0/0x28 [] 0xffffffff

Let me know if additional information is required to track this issue down.

rpi-4.14.y-kernel_configuration_file.gz

Thank you, -Nikolay

anholt commented 6 years ago

Possibly related: https://lists.freedesktop.org/archives/dri-devel/2018-January/161625.html

nkichukov commented 6 years ago

Indeed, looks like it! I will apply the patch and see if it fixes it. Will report back as soon as I collect the results.

Thank you! -N

nkichukov commented 6 years ago

The patch would not apply on 4.14.y kernels. Had to upgrade to raspberrypi linux 4.15.rc6:

patching file drivers/gpu/drm/drm_atomic_helper.c Hunk #1 succeeded at 3327 (offset -94 lines).

However, this is what I get when the vc4 kernel module loads at boot time:

Jan  1 01:00:28 grpi kernel: [   12.153431] vc4_hdmi 3f902000.hdmi: vc4-hdmi-hifi <-> 3f902000.hdmi mapping ok
Jan  1 01:00:28 grpi kernel: [   12.154337] vc4-drm soc:gpu: bound 3f902000.hdmi (ops vc4_hdmi_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.154427] vc4-drm soc:gpu: bound 3f400000.hvs (ops vc4_hvs_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.154598] vc4-drm soc:gpu: bound 3f206000.pixelvalve (ops vc4_crtc_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.154705] vc4-drm soc:gpu: bound 3f207000.pixelvalve (ops vc4_crtc_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.154811] vc4-drm soc:gpu: bound 3f807000.pixelvalve (ops vc4_crtc_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.170853] ------------[ cut here ]------------
Jan  1 01:00:28 grpi kernel: [   12.170883] WARNING: CPU: 1 PID: 980 at kernel/irq/chip.c:244 __irq_startup+0xb4/0xb8
Jan  1 01:00:28 grpi kernel: [   12.170887] Modules linked in: vc4(+) snd_soc_core snd_pcm_dmaengine drm_kms_helper drm evdev cec snd_bcm2835(C) fb font smsc95xx snd_pcm usbnet
 mii snd_timer snd i2c_bcm2835 fixed
Jan  1 01:00:28 grpi kernel: [   12.170951] CPU: 1 PID: 980 Comm: systemd-udevd Tainted: G         C       4.15.0-rc6-v7+ #1
Jan  1 01:00:28 grpi kernel: [   12.170954] Hardware name: BCM2835
Jan  1 01:00:28 grpi kernel: [   12.170977] [<801108fc>] (unwind_backtrace) from [<8010ca9c>] (show_stack+0x20/0x24)
Jan  1 01:00:28 grpi kernel: [   12.170987] [<8010ca9c>] (show_stack) from [<8065ded8>] (dump_stack+0xc8/0x10c)
Jan  1 01:00:28 grpi kernel: [   12.170997] [<8065ded8>] (dump_stack) from [<8011e48c>] (__warn+0x104/0x11c)
Jan  1 01:00:28 grpi kernel: [   12.171007] [<8011e48c>] (__warn) from [<8011e594>] (warn_slowpath_null+0x50/0x58)
Jan  1 01:00:28 grpi kernel: [   12.171015] [<8011e594>] (warn_slowpath_null) from [<8017d288>] (__irq_startup+0xb4/0xb8)
Jan  1 01:00:28 grpi kernel: [   12.171026] [<8017d288>] (__irq_startup) from [<8017d2ec>] (irq_startup+0x60/0x128)
Jan  1 01:00:28 grpi kernel: [   12.171036] [<8017d2ec>] (irq_startup) from [<8017aca4>] (__enable_irq+0x78/0x7c)
Jan  1 01:00:28 grpi kernel: [   12.171044] [<8017aca4>] (__enable_irq) from [<8017acec>] (enable_irq+0x44/0x7c)
Jan  1 01:00:28 grpi kernel: [   12.171118] [<8017acec>] (enable_irq) from [<7f1f097c>] (vc4_irq_postinstall+0x24/0x40 [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171345] [<7f1f097c>] (vc4_irq_postinstall [vc4]) from [<7f0e4184>] (drm_irq_install+0xe0/0x120 [drm])
Jan  1 01:00:28 grpi kernel: [   12.171520] [<7f0e4184>] (drm_irq_install [drm]) from [<7f1f3b00>] (vc4_v3d_bind+0x138/0x230 [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171564] [<7f1f3b00>] (vc4_v3d_bind [vc4]) from [<804684b8>] (component_bind_all+0x12c/0x24c)
Jan  1 01:00:28 grpi kernel: [   12.171604] [<804684b8>] (component_bind_all) from [<7f1e54f8>] (vc4_drm_bind+0xa4/0x14c [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171646] [<7f1e54f8>] (vc4_drm_bind [vc4]) from [<8046896c>] (try_to_bring_up_master+0x180/0x1bc)
Jan  1 01:00:28 grpi kernel: [   12.171654] [<8046896c>] (try_to_bring_up_master) from [<80468c38>] (component_master_add_with_match+0x9c/0xd0)
Jan  1 01:00:28 grpi kernel: [   12.171692] [<80468c38>] (component_master_add_with_match) from [<7f1e5664>] (vc4_platform_drm_probe+0xc4/0xd4 [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171751] [<7f1e5664>] (vc4_platform_drm_probe [vc4]) from [<80470570>] (platform_drv_probe+0x60/0xc0)
Jan  1 01:00:28 grpi kernel: [   12.171763] [<80470570>] (platform_drv_probe) from [<8046eacc>] (driver_probe_device+0x25c/0x338)
Jan  1 01:00:28 grpi kernel: [   12.171774] [<8046eacc>] (driver_probe_device) from [<8046ec70>] (__driver_attach+0xc8/0xcc)
Jan  1 01:00:28 grpi kernel: [   12.171782] [<8046ec70>] (__driver_attach) from [<8046cc40>] (bus_for_each_dev+0x78/0xac)
Jan  1 01:00:28 grpi kernel: [   12.171791] [<8046cc40>] (bus_for_each_dev) from [<8046e3d0>] (driver_attach+0x2c/0x30)
Jan  1 01:00:28 grpi kernel: [   12.171800] [<8046e3d0>] (driver_attach) from [<8046de1c>] (bus_add_driver+0x114/0x220)
Jan  1 01:00:28 grpi kernel: [   12.171807] [<8046de1c>] (bus_add_driver) from [<8046f41c>] (driver_register+0x88/0x104)
Jan  1 01:00:28 grpi kernel: [   12.171814] [<8046f41c>] (driver_register) from [<804704bc>] (__platform_driver_register+0x50/0x58)
Jan  1 01:00:28 grpi kernel: [   12.171853] [<804704bc>] (__platform_driver_register) from [<7f209040>] (vc4_drm_register+0x40/0x4c [vc4])
Jan  1 01:00:28 grpi kernel: [   12.171910] [<7f209040>] (vc4_drm_register [vc4]) from [<80101c5c>] (do_one_initcall+0x54/0x17c)
Jan  1 01:00:28 grpi kernel: [   12.171922] [<80101c5c>] (do_one_initcall) from [<801af7dc>] (do_init_module+0x74/0x224)
Jan  1 01:00:28 grpi kernel: [   12.171931] [<801af7dc>] (do_init_module) from [<801ae888>] (load_module+0x1f04/0x25a8)
Jan  1 01:00:28 grpi kernel: [   12.171940] [<801ae888>] (load_module) from [<801af160>] (SyS_finit_module+0xb8/0xc8)
Jan  1 01:00:28 grpi kernel: [   12.171948] [<801af160>] (SyS_finit_module) from [<80108140>] (ret_fast_syscall+0x0/0x28)
Jan  1 01:00:28 grpi kernel: [   12.171955] ---[ end trace 35939b95472d70af ]---
Jan  1 01:00:28 grpi kernel: [   12.172031] vc4-drm soc:gpu: bound 3fc00000.v3d (ops vc4_v3d_ops [vc4])
Jan  1 01:00:28 grpi kernel: [   12.172613] [drm] Initialized vc4 0.0.0 20140616 for soc:gpu on minor 0

and those errors were printed in the messages log:

Jan  6 00:58:48 grpi kernel: [  497.632306] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:53:crtc-2] flip_done timed out
Jan  6 00:59:11 grpi kernel: [  521.184305] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:52:plane-20] flip_done timed out

The system continues to function as normal, video rendering is not impacted as kodi plays everything just fine. At this stage I cannot say if there are any side (de)effects from the drm changes introduced in 4.15.y and their interaction with the vc4 module.

I do not see the memory leak at this moment that was originally reported in this issue, but I have to monitor for some time longer and will let you know if that is resolved.

Cheers, -Nik

nkichukov commented 6 years ago

Hello Eric, The patch fixes the memory leak reported here.

If you believe there is something you can do for the stack trace when the module loads and the two errors that show up, let me know and I can open another Issue for it to track separately. If not, just close this case as the suggested worked and I no longer see the memory leak.

Thank you, -Nik

stschake commented 6 years ago

The warning is fixed upstream:

http://lists.infradead.org/pipermail/linux-rpi-kernel/2017-December/007226.html

nkichukov commented 6 years ago

Thanks Stefan, I see this is not merged into rpi-4.15-rc7 yet. But I will apply the patches manually for now as I plan on getting rc7 as this has the proposed KAISER/MELTDOWN patches from mainline which seem to affect the cortex a53 ARM CPUs too.

I will let you know if the warning is gone once I have the patched kernel booted up.

Cheers, -N

nkichukov commented 6 years ago

rpi-4.15-rc7 with both patches applied resolves all of the described above. Hope those get merged into mainline any time sooner so I no longer have to patch manually for the next kernel upgrade.

Thanks for your support and keep up the good work for making the free software better and better! -Nikolay