ARM-software / tf-issues

Issue tracking for the ARM Trusted Firmware project
37 stars 16 forks source link

HiKey960: psci issues on A73s #638

Open valschneider opened 5 years ago

valschneider commented 5 years ago

Hi,

I was cleaning up some hotplug torture test, and happened to run that on my HiKey960 (Debian) which resulted in a failure.

Turns out just a few hotplug operations are needed to trigger this, so I boiled it down to this small script:

for ((i = 0; i < 4; i++)); do
    echo "OFF $i"
    echo 0 > /sys/devices/system/cpu/cpu$i/online
    echo "ON $i"
    echo 1 > /sys/devices/system/cpu/cpu$i/online
    echo
done

Running this results in this (which is all fine):

----->8----- OFF 0 [ 80.819925] CPU0: shutdown [ 80.823851] psci: CPU0 killed. ON 0 [ 80.841609] Detected VIPT I-cache on CPU0 [ 80.845730] CPU0: Booted secondary processor 0x0000000000 [0x410fd034]

OFF 1 [ 80.927340] CPU1: shutdown [ 80.930204] psci: CPU1 killed. ON 1 [ 80.948701] Detected VIPT I-cache on CPU1 [ 80.952810] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]

OFF 2 [ 81.023079] CPU2: shutdown [ 81.026465] psci: CPU2 killed. ON 2 [ 81.036281] Detected VIPT I-cache on CPU2 [ 81.040402] CPU2: Booted secondary processor 0x0000000002 [0x410fd034]

OFF 3 [ 81.103528] CPU3: shutdown [ 81.106382] psci: CPU3 killed. ON 3 [ 81.121835] Detected VIPT I-cache on CPU3 [ 81.125975] CPU3: Booted secondary processor 0x0000000003 [0x410fd034] ----->8-----

Now, if I run this for CPUs [4-7], I eventually get this (takes a few tries):

----->8----- OFF 4 [ 73.149855] CPU4: shutdown [ 73.152628] psci: CPU4 killed. ON 4 [ 73.157491] Detected VIPT I-cache on CPU4 [ 73.161509] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU4: 0x00000000101122 [ 73.173813] arch_timer: CPU4: Trapping CNTVCT access [ 73.178782] CPU4: Booted secondary processor 0x0000000100 [0x410fd091]

OFF 5 [ 73.261245] CPU5: shutdown [ 73.264043] psci: CPU5 killed. ON 5 [ 74.272375] CPU5: failed to come online [ 74.276264] CPU5: failed in unknown state : 0x0 ./hotplug.sh: line 8: echo: write error: Input/output error

OFF 6 [ 74.311066] CPU6: shutdown [ 74.313829] psci: CPU6 killed. ON 6 [ 74.318544] Detected VIPT I-cache on CPU6 [ 74.322590] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU6: 0x00000000101122 [ 74.334884] arch_timer: CPU6: Trapping CNTVCT access [ 74.339854] CPU6: Booted secondary processor 0x0000000102 [0x410fd091]

OFF 7 [ 74.394989] CPU7: shutdown [ 74.397770] psci: CPU7 killed. ON 7 [ 74.402295] Detected VIPT I-cache on CPU7 [ 74.406475] CPU features: SANITY CHECK: Unexpected variation in SYS_ID_AA64MMFR0_EL1. Boot CPU: 0x00000000001122, CPU7: 0x00000000101122 [ 74.418748] arch_timer: CPU7: Trapping CNTVCT access [ 74.423709] CPU7: Booted secondary processor 0x0000000103 [0x410fd091] ----->8-----

Trying to online CPU5 yet again yields a slightly different result:

----->8----- [ 74.528657] psci: failed to boot CPU5 (-22) [ 74.534577] CPU5: failed to boot: -22 [ 74.538291] CPU5: failed in unknown state : 0x0 ./hotplug.sh: line 8: echo: write error: Invalid argument ----->8-----

Adding a printk shows that the actual return value from the SMCCC call is -5 (PSCI_RET_ON_PENDING ?).

No matter what I do next, I can't seem to be able to bring it back online - I have to reboot the board. It doesn't seem tied to any particular big CPU - I've had that happen for 4 & 7.

It happens both on mainline (4.19-rc7, 3a27203102eb) and on linux-next (774ea0551a29). I tried bisecting this but it's a bit tricky since the mainline support for HiKey960 is relatively recent.

Now, I'm fairly certain the issue comes from the PSCI implementation, which is why I'm raising this issue here instead of on LKML - do tell me if I'm misguided.

I use Linaro builds from here: https://snapshots.linaro.org/96boards/reference-platform/components/uefi-staging/. I have build 65 (which for some reason is no longer listed there), which gives me: NOTICE: BL2: v1.5(release):v1.5-649-g7e8a891f NOTICE: BL2: Built : 07:13:26, Aug 14 2018

I tried the latest build (78), but my board refuses to boot with it. I'll try to play with other builds when I have the time, but someone might be interested in this...

Cheers, Valentin

johnstultz-work commented 5 years ago

I've opened a bug and copied your issue over here to track this: https://bugs.96boards.org/show_bug.cgi?id=783

ghost commented 5 years ago

@hzhuang1

hzhuang1 commented 5 years ago

Build #79 is ready. And Leo is following up this issue. All are recorded in (https://bugs.96boards.org/show_bug.cgi?id=783).

ghost commented 5 years ago

This PR should fix this issue, right? https://github.com/ARM-software/arm-trusted-firmware/pull/1897