96boards-hikey / tools-images-hikey960

Tools and images for HiKey960
BSD 2-Clause "Simplified" License
53 stars 47 forks source link

Boot hangs #19

Closed jforissier closed 6 years ago

jforissier commented 6 years ago

Commit 92f365073cd8 ("lpm3, xloader: fix spi2 and i2c0 clock slow issue") breaks our OP-TEE environment (see https://github.com/OP-TEE/optee_os/issues/1851).

To reproduce:

$ mkdir -p $HOME/devel/optee
$ cd $HOME/devel/optee
$ repo init -u https://github.com/OP-TEE/manifest.git -m hikey960.xml
$ repo sync
$ (cd tools-images-hikey960; git checkout 92f365073cd8)
$ make -j9
$ make flash

Boot hangs at:

[0.753291] hi3660-mbox e896b000.mailbox: Mailbox enabled
[ 21.773459] INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 21.779191] 5-...: (1 GPs behind) idle=135/140000000000000/0 softirq=21/22 fqs=2625
[ 21.787109] (detected by 0, t=5252 jiffies, g=-289, c=-290, q=111)
[ 21.793448] Task dump for CPU 5:
[ 21.796706] swapper/0 R running task 0 1 0 0x00000002
[ 21.803837] Call trace:
[ 21.806320] [] __switch_to+0x94/0xa8
[ 21.811513] [<0000000000000040>] 0x40

Commit 92f365073cd8 does not mention any dependency (should we upgrade the kernel?), so I'm assuming this is a regression in tools-images-hikey960 and I'm reporting here.

hzhuang1 commented 6 years ago

@jforissier I want to reproduce your issue. I just want to confirm the environment. Are you using latest code of ATF/UEFI? Which kernel branch are you using? Loop @vchong

vchong commented 6 years ago

@hzhuang1 This involves aosp and probably all other environments too. See https://android-review.googlesource.com/502385. We think it's probably the "disable ocldo on big cluster" in https://github.com/96boards-hikey/tools-images-hikey960/commit/92f365073cd812a8d8ca159b01f52dd6dd06d111. Loop @docularxu

Leo-Yan commented 6 years ago

"disable ocldo on big cluster" is a bug fixing for system hang issue; from the log the CPU is locked up when send the mailbox message so I suspect the memory layout has some conflict between MCU and OP-TEE?

hzhuang1 commented 6 years ago

I think that MCU is using the memory space in below that is defined in HiKey960Mem.c of UEFI. { 0x89B80000, 0x00100000 }, // MCU Code reserved { 0x89C80000, 0x00040000 } // MCU reserved

And OPTEE is located at 0x3E000000. It seems that OPTEE shouldn't access the memory in MCU. @vchong Could you help to double confirm it?

hzhuang1 commented 6 years ago

Guodong is still on vocation in this week. Now I could reproduce this issue.

I did a few tests.

  1. Only enable little cluster and build OP-TEE into firmware.
  2. Only enable 1 core in little cluster and build OP-TEE into firmware.
  3. Build firmware without OP-TEE.

I could reproduce this issue in all these three cases. I could only avoid this issue by rolling back to commit ccb401f726 (recovery-flash: add '-e' flag for bash) that is just before commit 92f3650. It's clear that this issue is caused by sec_xloader.img.

We need hisilicon guys to fix this issue first. As a workaround, revert commit 92f3650 in master branch.

Leo-Yan commented 6 years ago

Loop in @Kevin-WangTao to aware this bug.

hzhuang1 commented 6 years ago

This bug is created in the bug system (https://bugs.96boards.org/show_bug.cgi?id=617). I recommend to discuss it in the #617 bug instead for easy tracking.

Leo-Yan commented 6 years ago

@hzhuang1 I found we missed one thing, when we update the sec_xloader.bin, we also need update the latest lpm3.img for OpenPlatformPkg/Platforms/Hisilicon/HiKey960/Binary/lpm3.img, so sec_xloader.bin and lpm3.img can match with each other.

hzhuang1 commented 6 years ago

@Leo-Yan In original test, lpm3 isn't included into OpenPlatformPkg. When I include it for test, it doesn't help me on this issue.

hzhuang1 commented 6 years ago

Close it and move to bug #617 (https://bugs.96boards.org/show_bug.cgi?id=617).

Kevin-WangTao commented 6 years ago

@Leo-Yan the change of lpm3.img has no dependency on xloader, so it doesn't matter that the images don't match with each other

Leo-Yan commented 6 years ago

@Kevin-WangTao thanks for confirmation. Let's use bug #617 (https://bugs.96boards.org/show_bug.cgi?id=617) for later's discussion.