Closed pekka-saastamoinen-etteplan closed 6 years ago
ARM Internal Ref: IOTTHD-2778
@karsev @MarceloSalazar
@pekka-saastamoinen-tridonic-com @markus-becker-tridonic-com Thanks for raising this - let us know once you have an application that can be used to reproduce this behavior. We've been investigating based on the information you've shared, but haven't seen the BR crashing so far. Thanks!
This branch should show the changes to mbed-os we applied to have the nodes ping the BR:
https://github.com/pekka-saastamoinen-tridonic-com/mbed-os/tree/pekka-saastamoinen-tridonic-com-ping
@pekka-saastamoinen-tridonic-com what node application and configuration are you referring to? Can you fork the app, make the changes and point us at that, so we can just clone and reproduce the issue?
Not able to reproduce the hard-fault. BR runs out of memory during heavy ping testing but recovers at the end.
Error reproduced, there is a stack overflow in Nanostack code that is causing the hard fault. A fix will be released once it passes testing.
There was a recursive loop in MAC error handling that caused stack overflow. Hard fault happened because Mbed timers were using the corrupted stack.
Two PRs are now merged to master branch to remove recursion: https://github.com/ARMmbed/sal-stack-nanostack-private/pull/1826 and https://github.com/ARMmbed/sal-stack-nanostack-private/pull/1830.
The nanostack-border-router can be crashed by pinging it heavily by about 8 Thread nodes. We first noticed this on our fork of the code and decided to test it on the reference hardware as well.
HW: Raspberry Pi 3 + 6lowpan shield
Raspi image from Mbed access point https://github.com/ARMmbed/mbed-access-point/blob/master/binaries/openwrt-mbedap-v4.0.1-brcm2708-bcm2710-rpi-3-ext4-sdcard.img.gz nanostack-border-router binary build from from https://github.com/ARMmbed/nanostack-border-router/commit/fa34a9d474adba58ea31752465ba39c6704995d2, GCC_ARM toolchain, almost stock SLIP config (matched channel, pan-id and keys with node setup).
Connected 8 Thread nodes which start connecting, wait 30 seconds and start pinging the BR every 50 ms each. The BR Mbed app crashes within about 30 seconds.
Crash variant 1: ++ MbedOS Error Info ++ Error Status: 0x80FF013D Code: 317 Module: 255 Error Message: Fault exception Location: 0x4B5D7 Error Value: 0x3F10 Current Thread: Id: 0x20004048 Entry: 0x14735 StackSize: 0x1800 StackMem: 0x20002848 SP: 0x2002FF40 -- MbedOS Error Info -- Crash Info: Crash location = mbed::TimerEvent::irq(unsigned long) [0x00003F10] (based on PC value) Caller location = ticker_irq_handler [0x0004A6B3] (based on LR value) Stack Pointer at the time of crash = [2002FFC8] Target and Fault Info: Processor Arch: ARM-V7M or above Processor Variant: C24 Forced exception, a fault with configurable priority has been escalated to HardFault A precise data access error has occurred. Faulting address: 20030008
Crash variant 2 : ++ MbedOS Error Info ++ Error Status: 0x80FF013D Code: 317 Module: 255 Error Message: Fault exception Location: 0x4B5D7 Error Value: 0xC4BF00BC Current Thread: Id: 0x20004048 Entry: 0x14735 StackSize: 0x1800 StackMem: 0x20002848 SP: 0x2002FF28 -- MbedOS Error Info -- Crash Info: Crash location = __init_array_end [0xC4BF00BC] (based on PC value) Caller location = mbed::Timeout::handler() [0x00003E3B] (based on LR value) Stack Pointer at the time of crash = [2002FFB0] Target and Fault Info: Processor Arch: ARM-V7M or above Processor Variant: C24 Forced exception, a fault with configurable priority has been escalated to HardFault MPU or Execute Never (XN) default memory map access violation on an instruction fetch has occurred
Attached logs for the first variant in the zip. bug_on_arm_hw_and_sw.zip
FYI: @karsev