Miouyouyou / RockMyy

Build scripts and patches used to cross-compile 5.6-rcX kernels for RK3288 boards
MIT License
32 stars 8 forks source link

Framebuffer : Vblank timeout #9

Closed Miouyouyou closed 4 years ago

Miouyouyou commented 6 years ago

On some configurations, the framebuffer fails to initialize the screen correctly.

It seems that the problem started to occur since some 4.16-rcX version. In such events, the system boots fine and X11 starts without issues.

So it's highly probable that this is a framebuffer related issue.

[    1.277829] WARNING: CPU: 2 PID: 102 at drivers/gpu/drm/drm_atomic_helper.c:1349 drm_atomic_helper_wait_for_vblanks+0x1c8/0x1e0
[    1.277831] [CRTC:30:crtc-0] vblank wait timed out
[    1.277833] Modules linked in:
[    1.277840] CPU: 2 PID: 102 Comm: kworker/2:4 Not tainted 4.16.0-RockMyy-16Bits-Edition #1
[    1.277842] Hardware name: Rockchip (Device Tree)
[    1.277848] Workqueue: events deferred_probe_work_func
[    1.277860] [<c010f664>] (unwind_backtrace) from [<c010b7d4>] (show_stack+0x10/0x14)
[    1.277868] [<c010b7d4>] (show_stack) from [<c0c711c4>] (dump_stack+0x70/0x8c)
[    1.277875] [<c0c711c4>] (dump_stack) from [<c011f278>] (__warn+0xdc/0xfc)
[    1.277881] [<c011f278>] (__warn) from [<c011f2cc>] (warn_slowpath_fmt+0x34/0x44)
[    1.277887] [<c011f2cc>] (warn_slowpath_fmt) from [<c070b318>] (drm_atomic_helper_wait_for_vblanks+0x1c8/0x1e0)
[    1.277894] [<c070b318>] (drm_atomic_helper_wait_for_vblanks) from [<c070cc80>] (drm_atomic_helper_commit_tail_rpm+0x48/0x58)
[    1.277900] [<c070cc80>] (drm_atomic_helper_commit_tail_rpm) from [<c070cb04>] (commit_tail+0x40/0x6c)
[    1.277904] [<c070cb04>] (commit_tail) from [<c070cc2c>] (drm_atomic_helper_commit+0xf4/0x100)
[    1.277910] [<c070cc2c>] (drm_atomic_helper_commit) from [<c070ed50>] (restore_fbdev_mode_atomic+0x7c/0x1b4)
[    1.277917] [<c070ed50>] (restore_fbdev_mode_atomic) from [<c07115a8>] (drm_fb_helper_restore_fbdev_mode_unlocked+0x48/0x7c)
[    1.277923] [<c07115a8>] (drm_fb_helper_restore_fbdev_mode_unlocked) from [<c0711628>] (drm_fb_helper_set_par+0x4c/0x5c)
[    1.277929] [<c0711628>] (drm_fb_helper_set_par) from [<c06a8eec>] (fbcon_init+0x2cc/0x45c)
[    1.277937] [<c06a8eec>] (fbcon_init) from [<c06df5d8>] (visual_init+0xc0/0x110)
[    1.277943] [<c06df5d8>] (visual_init) from [<c06e0db4>] (do_bind_con_driver+0x258/0x2d4)
[    1.277947] [<c06e0db4>] (do_bind_con_driver) from [<c06e1188>] (do_take_over_console+0x158/0x19c)
[    1.277952] [<c06e1188>] (do_take_over_console) from [<c06a90cc>] (do_fbcon_takeover+0x50/0xc0)
[    1.277958] [<c06a90cc>] (do_fbcon_takeover) from [<c013c7a4>] (notifier_call_chain+0x44/0x68)
[    1.277964] [<c013c7a4>] (notifier_call_chain) from [<c013ca20>] (__blocking_notifier_call_chain+0x40/0x58)
[    1.277970] [<c013ca20>] (__blocking_notifier_call_chain) from [<c013ca4c>] (blocking_notifier_call_chain+0x14/0x1c)
[    1.277977] [<c013ca4c>] (blocking_notifier_call_chain) from [<c06a1a88>] (register_framebuffer+0x5c/0x268)
[    1.277984] [<c06a1a88>] (register_framebuffer) from [<c0711278>] (__drm_fb_helper_initial_config_and_unlock+0x2b0/0x344)
[    1.277991] [<c0711278>] (__drm_fb_helper_initial_config_and_unlock) from [<c073a268>] (rockchip_drm_fbdev_init+0xa0/0x104)
[    1.277996] [<c073a268>] (rockchip_drm_fbdev_init) from [<c0735d88>] (rockchip_drm_bind+0x158/0x1cc)
[    1.278002] [<c0735d88>] (rockchip_drm_bind) from [<c0749a70>] (try_to_bring_up_master+0x68/0x158)
[    1.278008] [<c0749a70>] (try_to_bring_up_master) from [<c0749d88>] (component_master_add_with_match+0xa8/0xe0)
[    1.278014] [<c0749d88>] (component_master_add_with_match) from [<c0735fb0>] (rockchip_drm_platform_probe+0x1b4/0x204)
[    1.278019] [<c0735fb0>] (rockchip_drm_platform_probe) from [<c07505a8>] (platform_drv_probe+0x50/0x9c)
[    1.278025] [<c07505a8>] (platform_drv_probe) from [<c074ebc4>] (driver_probe_device+0x240/0x308)
[    1.278029] [<c074ebc4>] (driver_probe_device) from [<c074d410>] (bus_for_each_drv+0x84/0x8c)
[    1.278034] [<c074d410>] (bus_for_each_drv) from [<c074e8f8>] (__device_attach+0x98/0x100)
[    1.278039] [<c074e8f8>] (__device_attach) from [<c074deb8>] (bus_probe_device+0x28/0x80)
[    1.278043] [<c074deb8>] (bus_probe_device) from [<c074e3e4>] (deferred_probe_work_func+0x108/0x120)
[    1.278050] [<c074e3e4>] (deferred_probe_work_func) from [<c01360e0>] (process_one_work+0x220/0x468)
[    1.278056] [<c01360e0>] (process_one_work) from [<c0136354>] (process_scheduled_works+0x2c/0x30)
[    1.278061] [<c0136354>] (process_scheduled_works) from [<c0137090>] (worker_thread+0x2b8/0x3e8)
[    1.278066] [<c0137090>] (worker_thread) from [<c013b1dc>] (kthread+0x130/0x148)
[    1.278071] [<c013b1dc>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[    1.278074] Exception stack(0xed8a1fb0 to 0xed8a1ff8)
[    1.278077] 1fa0:                                     00000000 00000000 00000000 00000000
[    1.278081] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    1.278084] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    1.278087] ---[ end trace 0d44f5a5d5fde61d ]---
samsonluk commented 6 years ago

Start working on compilation of 4.16-rc4...

samsonluk commented 6 years ago

Prompting kernel config questions this time... Restart script after

$ cp tmp/RockMyy-0dfd08eeb9ca1ece37b083e22c494afd5f4a379c/config/v4.16/config-latest .config
$ cp .config linux
samsonluk commented 6 years ago

PASSED! No vblank timeout dmesg_4.16-rc4-RockMyy-16Bits-Edition.log

Miouyouyou commented 6 years ago

Alright...

My guess is that it could be either this change : https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm?h=v4.16&id=b24791fe00f8b089d5b10cb7bcc4e1ae88b4831b or this change https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm?h=v4.16&id=5a9f698feb11b198f17b2acebbfe0e2716a3beed

Both were introduced with the kernel 4.16-rc7.

One way to test one of the patches, without the other, would be to :

If that still fails... It will be time for some "printk" debugging.

samsonluk commented 6 years ago

Ok, start compile 4.16-rc7

samsonluk commented 6 years ago

With both changes:

Linux tinkerboard 4.16.0-rc7-RockMyy-16Bits-Edition #1 SMP PREEMPT Mon Jul 23 15:45:32 UTC 2018 armv7l
dmesg | grep vblank
[    1.112528] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.112531] [drm] No driver support for vblank timestamp query.
[    1.287346] WARNING: CPU: 0 PID: 102 at drivers/gpu/drm/drm_atomic_helper.c:1349 drm_atomic_helper_wait_for_vblanks+0x1c8/0x1e0
[    1.287348] [CRTC:30:crtc-0] vblank wait timed out
[    1.287404] [<c011f2ec>] (warn_slowpath_fmt) from [<c070b2f8>] (drm_atomic_helper_wait_for_vblanks+0x1c8/0x1e0)
[    1.287411] [<c070b2f8>] (drm_atomic_helper_wait_for_vblanks) from [<c070cc60>] (drm_atomic_helper_commit_tail_rpm+0x48/0x58)

git apply -R ./GetFB.patch

dmesg | grep vblank
[    1.113467] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.113469] [drm] No driver support for vblank timestamp query.
[    1.288810] WARNING: CPU: 0 PID: 101 at drivers/gpu/drm/drm_atomic_helper.c:1349 drm_atomic_helper_wait_for_vblanks+0x1c8/0x1e0
[    1.288851] [CRTC:30:crtc-0] vblank wait timed out
[    1.288909] [<c011f2ec>] (warn_slowpath_fmt) from [<c070b2f8>] (drm_atomic_helper_wait_for_vblanks+0x1c8/0x1e0)
[    1.288915] [<c070b2f8>] (drm_atomic_helper_wait_for_vblanks) from [<c070cc60>] (drm_atomic_helper_commit_tail_rpm+0x48/0x58)

git apply -R ./DRMAST.patch error: drivers/gpu/drm/ast/ast_tables.h: No such file or directory

Oh, should have backup linux/ before git apply...

Miouyouyou commented 6 years ago

You can do git checkout -- . to revert non commited changes.

I'm surprised you don't have this file, since it's still present in the latest 4.18-rc. Be sure to apply the patch from the linux directory, else the patch will fail.

Anyway, we're sure that it's between the rc4 and rc7. So if you're good for a last compilation, try the -rc6. If the problem doesn't occur with the -rc6, that it's clearly in between. In this case, keep the linux/ of the rc6 and rc7`, we'll need it, since we'll have to add some "printk" to the code, in order to understand how things are initialised in both trees and pinpoint the differences.

samsonluk commented 6 years ago

Oops, stupid me! I actually execute the wget/git apply in the wrong machine...

Anyway, saved linux/ of rc7

git apply -R ./DRMAST.patch

dmesg | grep vblank
[    1.111755] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.111758] [drm] No driver support for vblank timestamp query.
[    1.284329] WARNING: CPU: 3 PID: 102 at drivers/gpu/drm/drm_atomic_helper.c:1349 drm_atomic_helper_wait_for_vblanks+0x1c8/0x1e0
[    1.284331] [CRTC:30:crtc-0] vblank wait timed out
[    1.284388] [<c011f2ec>] (warn_slowpath_fmt) from [<c070b2f8>] (drm_atomic_helper_wait_for_vblanks+0x1c8/0x1e0)
[    1.284395] [<c070b2f8>] (drm_atomic_helper_wait_for_vblanks) from [<c070cc60>] (drm_atomic_helper_commit_tail_rpm+0x48/0x58)

So not rc7. Go for -rc6 now...

samsonluk commented 6 years ago

rc6 PASSED!

Linux tinkerboard 4.16.0-rc6-RockMyy-16Bits-Edition #1 SMP PREEMPT Tue Jul 24 08:27:54 UTC 2018 armv7l GNU/Linux
root@tinkerboard:~#
root@tinkerboard:~# dmesg | grep vblank
[    1.112326] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.112328] [drm] No driver support for vblank timestamp query.

I have both linux/ ready: drwxr-xr-x 27 root root 4096 Jul 24 10:03 linux_4.16-rc6 drwxr-xr-x 25 root root 4096 Jul 24 07:18 linux_4.16-rc7

Miouyouyou commented 6 years ago

Here's the patch that should apply to both. This will pollute the logs so get the logs quickly.

From aef02775f33ba10090388316648e1206f45e74fc Mon Sep 17 00:00:00 2001
From: "Miouyouyou (Myy)" <myy@miouyouyou.fr>
Date: Tue, 24 Jul 2018 21:18:17 +0200
Subject: [PATCH] Let's pollute the logs

To get some informations about CRTC.
Not really working at the moment.
Most informations are bogus.

Signed-off-by: Miouyouyou (Myy) <myy@miouyouyou.fr>
---
 drivers/gpu/drm/drm_atomic_helper.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index ae3cbfe9e..37f1fc62a 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1345,6 +1345,23 @@ drm_atomic_helper_wait_for_vblanks(struct drm_device *dev,
                                        drm_crtc_vblank_count(crtc),
                                msecs_to_jiffies(50));

+               dev_warn(dev->dev, "(Myy)\n"
+                       "\tCRTC[%i] (%d : %s)\n"
+                       "\tOld state last vblank Count : %llu\n"
+                       "\tCRTC Vblank_count           : %llu\n"
+                       "\tWait Event Timeout Ret      :  %d\n"
+                       "\tCRTC Interrupt enabled      ?  %d\n"
+                       "\tCRTC Interrupt number       :  %d\n"
+                       "\tCRTC Device                 :  %p\n"
+                       "\tArgument Device             :  %p\n",
+                       i, crtc->base.id, crtc->name,
+                       old_state->crtcs[i].last_vblank_count,
+                       drm_crtc_vblank_count(crtc),
+                       ret,
+                       (crtc->dev ? crtc->dev->irq_enabled : 0),
+                       (crtc->dev ? crtc->dev->irq : 0),
+                       crtc->dev, dev);
+
                WARN(!ret, "[CRTC:%d:%s] vblank wait timed out\n",
                     crtc->base.id, crtc->name);

-- 
2.16.4

Once you're done with the patch, try to the following :

cd linux_4.16-rc7
wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=3215b9d57a2c75c4305a3956ca303d7004485200 -O Clocks.patch
git apply -R Clocks.patch

And see how it goes.

Miouyouyou commented 6 years ago

Note that this patch should be applied to both kernels and tested on both, in order to see if there's any difference between them.

The last commands only target the rc7.

samsonluk commented 6 years ago

What's wrong? linux/ is a clone of linux_4.16-rc6

root@91d81f86bc53:/usr/src/linux# git am a.patch
Applying: Let's pollute the logs
error: patch failed: drivers/gpu/drm/drm_atomic_helper.c:1345
error: drivers/gpu/drm/drm_atomic_helper.c: patch does not apply
Patch failed at 0001 Let's pollute the logs
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
samsonluk commented 6 years ago

Resolved:

root@91d81f86bc53:/usr/src/linux# git am --ignore-whitespace --ignore-space-change a.patch
Applying: Let's pollute the logs
samsonluk commented 6 years ago

this is linux_4.16-rc7 after "Applying: Let's pollute the logs" patch:

root@91d81f86bc53:/usr/src/linux# git apply -R Clocks.patch
error: patch failed: drivers/clk/clk-aspeed.c:259
error: drivers/clk/clk-aspeed.c: patch does not apply
root@91d81f86bc53:/usr/src/linux# git apply -v -R Clocks.patch
Checking patch drivers/clk/sunxi-ng/ccu-sun6i-a31.c...
Checking patch drivers/clk/imx/clk-imx51-imx53.c...
Checking patch drivers/clk/ti/clkctrl.c...
Checking patch drivers/clk/ti/clk-33xx.c...
Checking patch drivers/clk/ti/clk-43xx.c...
Checking patch drivers/clk/clk.c...
Hunk #1 succeeded at 2311 (offset -2 lines).
Checking patch drivers/clk/clk.c...
Hunk #1 succeeded at 2969 (offset -2 lines).
Hunk #2 succeeded at 3002 (offset -2 lines).
Checking patch drivers/clk/clk.c...
Checking patch drivers/clk/hisilicon/clk-hi3660-stub.c...
Checking patch drivers/clk/qcom/apcs-msm8916.c...
Checking patch drivers/clk/clk-aspeed.c...
error: while searching for:
{
        struct aspeed_clk_gate *gate = to_aspeed_clk_gate(hw);
        u32 clk = BIT(gate->clock_idx);
        u32 enval = (gate->flags & CLK_GATE_SET_TO_DISABLE) ? 0 : clk;
        u32 reg;

        regmap_read(gate->map, ASPEED_CLK_STOP_CTRL, &reg);

        return ((reg & clk) == enval) ? 1 : 0;
}

static const struct clk_ops aspeed_clk_gate_ops = {

error: patch failed: drivers/clk/clk-aspeed.c:259
error: drivers/clk/clk-aspeed.c: patch does not apply
Checking patch drivers/clk/clk-aspeed.c...
Checking patch drivers/clk/bcm/clk-bcm2835.c...
Checking patch drivers/clk/bcm/clk-bcm2835.c...
root@91d81f86bc53:/usr/src/linux#
Miouyouyou commented 6 years ago

Which version does /usr/src/linux links to ? This is the latest patch affecting drivers/clk/clk-aspeed.c for the 4.16-rc7, so I don't know why it does not apply... I'll try to prepare a tarball if needed.

samsonluk commented 6 years ago

It is 4.16-rc7, I even deleted linux/ and restarted the script again from scratch:

root@91d81f86bc53:/usr/src# more GetPatchAndCompileKernel_4.16-rc7.sh
export ARCH=arm
export CROSS_COMPILE=

export KERNEL_GIT_URL='git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git'

export KERNEL_SERIES=v4.16
export KERNEL_BRANCH=v4.16-rc7
export LOCALVERSION=-RockMyy-16Bits-Edition
export MALI_VERSION=r19p0-01rel0
export MALI_BASE_URL=https://developer.arm.com/-/media/Files/downloads/mali-drivers/kernel/mali-midgard-gpu

Note: checking out '3eb2ce825ea1ad89d20f7a3b5780df850e4be274'.

Miouyouyou commented 6 years ago

Well, let's try it like that than :

wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/drivers/clk/clk-aspeed.c?id=8a53fc511c5ec81347b981b438f68c3dde421608 -O clk.patch
git apply -R clk.patch

I'm just wondering if some Clock patches generated issues with the HDMI detection.

samsonluk commented 6 years ago

Here come the logs: dmesg_4.16-rc6_MyyPatch-RockMyy-16Bits-Edition.log dmesg_4.16-rc7_MyyPatch-RockMyy-16Bits-Edition.log

Miouyouyou commented 6 years ago

Hmm, I guess I messed up the vblank count display. In the first case, it seems to be at 0 (2**32 overflowing at 0) In the second case, it seems to be at -46, if it's a 32 bits value...

I'll try to modify my script so that you can pull the kernel sources at one specific commit...

It seems like a very subtle bug. If you remove the timestamps of both dmesg logs and you cut them before the first "Myy" log and do a diff, both logs are almost identical. Some pointers addresses change, but that's due to the recompilation process and various sources change, but these addresses don't seem null or completely off (like 0xffffffff).

That some tricky bug...

Could you test a compilation by doing like this :

mv linux{,.bak}
wget https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/snapshot/linux-3215b9d57a2c75c4305a3956ca303d7004485200.tar.gz -O linux.tgz
tar zxvf linux.tgz
mv linux-3215* linux
cd linux
git init .
git add .
git commit -m "My wonderful kernel"
cd ..

And then run the usual script.

samsonluk commented 6 years ago

FYI, Nodepad++ for Windows can compare the logs without cutting time stamp.

I will try that out tomorrow...

samsonluk commented 6 years ago

Done dmesg

Miouyouyou commented 6 years ago

Was the screen plugged-in during the boot sequence ? I'm reading this, in your last log :

[    1.114972] [drm] Cannot find any crtc or sizes
samsonluk commented 6 years ago

Oops. switched off

Miouyouyou commented 4 years ago

If you still have these issues, feel free to reopen this ticket. Meanwhile, I'm closing it due to no activity.