SolidRun / linux-fslc

Linux kernel source tree
Other
12 stars 26 forks source link

Crashes in call_timer_fn #17

Closed warped-rudi closed 8 years ago

warped-rudi commented 8 years ago

Starting about tree weeks ago I experience crashes like this:

Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = 80004000
[00000000] *pgd=00000000
Internal error: Oops: 80000007 [#1] SMP ARM
Modules linked in: rfcomm bluetooth 6lowpan_iphc brcmfmac brcmutil cfg80211 rfkill ir_lirc_codec lirc_dev ir_rc5_sz_decoder ir_sanyo_decoder ir_mce_kbd_decoder ir_sony_decoder ir_nec_decoder ir_jvc_decoder ir_rc6_decoder ir_rc5_decoder uinput
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.48 #1
task: dc083600 ti: dc0a8000 task.ti: dc0a8000
PC is at 0x0
LR is at call_timer_fn.isra.33+0x24/0x84
pc : [<00000000>]    lr : [<8003391c>]    psr: 60010113
sp : dc0a9e88  ip : 00000000  fp : 40000001
r10: 80c44080  r9 : dc032814  r8 : 80c440c0
r7 : 00000000  r6 : 00000100  r5 : dc0a8000  r4 : dc0a8008
r3 : dc0a9e88  r2 : 00000000  r1 : 00000000  r0 : 00000000
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment kernel
Control: 10c53c7d  Table: 6cd8004a  DAC: 00000015
Process swapper/1 (pid: 0, stack limit = 0xdc0a8238)
Stack: (0xdc0a9e88 to 0xdc0aa000)
9e80:                   dc6701e4 dc6701e4 dc032000 dc032000 dc0a9ea8 00000000
9ea0: 00200200 80033b00 dc0a9ea8 dc0a9ea8 00000000 000000a0 80c44084 dc0a8000
9ec0: 00000100 dc0a8010 00000001 8002d6a4 ffffffff 7fffffff 00000003 00000001
9ee0: 80c44080 80c3f458 80c973c0 0000000a 806150d8 ffffc715 80c440c0 00200040
9f00: 00000000 dc0a8028 00000001 00000000 f4a00100 00000001 ee94d054 dc0a8000
9f20: 00000000 8002da1c 80c3fde4 800141e4 f4a0010c 80c4a9c8 dc0a9f58 8000854c
9f40: 8049df40 00010013 ffffffff dc0a9f8c 00000001 800120c0 dc0a9fa0 3b9aca00
9f60: eb358953 00000023 ee94d050 80c4f920 eae00d36 00000023 00000001 ee94d054
9f80: dc0a8000 00000000 00000017 dc0a9fa0 00000009 8049df40 00010013 ffffffff
9fa0: eb358953 00000023 00000000 80cece34 ee94d050 00000000 80cece34 00000001
9fc0: 80c4f920 8049e0cc dc0a8030 80c4a4d0 80c4a534 806150cc 80c95ebd 80c95ebd
9fe0: dc0a8000 8000eeac dc0a8030 800678fc 80c96394 100085e4 ffffffff ffffffff
[<8003391c>] (call_timer_fn.isra.33) from [<80033b00>] (run_timer_softirq+0x184/0x208)
[<80033b00>] (run_timer_softirq) from [<8002d6a4>] (__do_softirq+0x138/0x23c)
[<8002d6a4>] (__do_softirq) from [<8002da1c>] (irq_exit+0xac/0xf4)
[<8002da1c>] (irq_exit) from [<800141e4>] (handle_IPI+0xd0/0x16c)
[<800141e4>] (handle_IPI) from [<8000854c>] (gic_handle_irq+0x58/0x5c)
[<8000854c>] (gic_handle_irq) from [<800120c0>] (__irq_svc+0x40/0x50)
Exception stack(0xdc0a9f58 to 0xdc0a9fa0)
9f40:                                                       dc0a9fa0 3b9aca00
9f60: eb358953 00000023 ee94d050 80c4f920 eae00d36 00000023 00000001 ee94d054
9f80: dc0a8000 00000000 00000017 dc0a9fa0 00000009 8049df40 00010013 ffffffff
[<800120c0>] (__irq_svc) from [<8049df40>] (cpuidle_enter_state+0x50/0xe4)
[<8049df40>] (cpuidle_enter_state) from [<8049e0cc>] (cpuidle_idle_call+0xf8/0x148)
[<8049e0cc>] (cpuidle_idle_call) from [<8000eeac>] (arch_cpu_idle+0x8/0x44)
[<8000eeac>] (arch_cpu_idle) from [<800678fc>] (cpu_startup_entry+0x100/0x144)
[<800678fc>] (cpu_startup_entry) from [<100085e4>] (0x100085e4)
Code: bad PC value
---[ end trace 2bd2d852627d1382 ]---

It seems to be related to power management, but I have no idea how to track this down. Any ideas ?

linux4kix commented 8 years ago

Do you have runtime pm enabled in the kernel config?

warped-rudi commented 8 years ago

Yes.

CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
CONFIG_PM_SLEEP=y
CONFIG_PM_SLEEP_SMP=y
# CONFIG_PM_AUTOSLEEP is not set
# CONFIG_PM_WAKELOCKS is not set
CONFIG_PM_RUNTIME=y
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
# CONFIG_APM_EMULATION is not set
CONFIG_ARCH_HAS_OPP=y
CONFIG_PM_OPP=y
CONFIG_PM_CLK=y
CONFIG_PM_GENERIC_DOMAINS=y
# CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set
CONFIG_PM_GENERIC_DOMAINS_SLEEP=y
CONFIG_PM_GENERIC_DOMAINS_RUNTIME=y
CONFIG_PM_GENERIC_DOMAINS_OF=y
CONFIG_CPU_PM=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARM_CPU_SUSPEND=y

The crash happens randomly, but usually after the CuBox/Kodi is idle for a while.

linux4kix commented 8 years ago

This is with an up to date kernel? There were some bugs in the fec ethernet driver that forced me to revert some PM_RUNTIME changes I had made. They sometimes caused errors like this depending on the switch it was connected to.

On Sat, Sep 12, 2015 at 1:02 PM, Rudi Ihle notifications@github.com wrote:

Yes.

CONFIG_SUSPEND=y CONFIG_SUSPEND_FREEZER=y CONFIG_PM_SLEEP=y CONFIG_PM_SLEEP_SMP=y

CONFIG_PM_AUTOSLEEP is not set

CONFIG_PM_WAKELOCKS is not set

CONFIG_PM_RUNTIME=y CONFIG_PM=y

CONFIG_PM_DEBUG is not set

CONFIG_APM_EMULATION is not set

CONFIG_ARCH_HAS_OPP=y CONFIG_PM_OPP=y CONFIG_PM_CLK=y CONFIG_PM_GENERIC_DOMAINS=y

CONFIG_WQ_POWER_EFFICIENT_DEFAULT is not set

CONFIG_PM_GENERIC_DOMAINS_SLEEP=y CONFIG_PM_GENERIC_DOMAINS_RUNTIME=y CONFIG_PM_GENERIC_DOMAINS_OF=y CONFIG_CPU_PM=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARM_CPU_SUSPEND=y

The crash happens randomly, but usually after the CuBox/Kodi is idle for a while.

— Reply to this email directly or view it on GitHub https://github.com/SolidRun/linux-fslc/issues/17#issuecomment-139745320.

warped-rudi commented 8 years ago

Yes, had it right now with the latest 3.14.51. The wired ethernet is not connected at all. However, some geexbox-specific patches are in place. Will remove them to rule that out...

warped-rudi commented 8 years ago

O.K., this particular crash only happens on my "special" CuBox-i that has a broken PHY. It is not related to power management, but to the fact that the fec driver will not shutdown it's timer callback even though the device is not instantiated.