Closed ryankurte closed 6 years ago
On further investigation it seems that the NCS36510 driver avoids this by running a thread to delegate calls to the underlying driver, which is a little more difficult (lots of data to communicate back from interrupts) but probably achievable. It might also be interesting to make the mac callbacks thread / interrupt safe to avoid the need for this.
Implementing the bare minimum to continue the choose your own adventure nets me:
#0 mesh_system_heap_error_handler (event=NS_DYN_MEM_ALLOCATE_SIZE_NOT_VALID)
at ./mbed-os/features/nanostack/FEATURE_NANOSTACK/mbed-mesh-api/source/mesh_system.c:58
#1 0x00002ff4 in heap_failure (book=0x20003084 <app_stack_heap>, reason=NS_DYN_MEM_ALLOCATE_SIZE_NOT_VALID)
at ./mbed-os/features/FEATURE_COMMON_PAL/nanostack-libservice/source/nsdynmemLIB/nsdynmemLIB.c:64
#2 0x00003212 in convert_allocation_size (book=0x20003084 <app_stack_heap>, requested_bytes=65462)
at ./mbed-os/features/FEATURE_COMMON_PAL/nanostack-libservice/source/nsdynmemLIB/nsdynmemLIB.c:174
#3 0x0000329a in ns_mem_internal_alloc (book=0x20003084 <app_stack_heap>, alloc_size=65462, direction=-1)
at ./mbed-os/features/FEATURE_COMMON_PAL/nanostack-libservice/source/nsdynmemLIB/nsdynmemLIB.c:210
#4 0x00003490 in ns_mem_alloc (heap=0x20003084 <app_stack_heap>, alloc_size=65462)
at ./mbed-os/features/FEATURE_COMMON_PAL/nanostack-libservice/source/nsdynmemLIB/nsdynmemLIB.c:300
#5 0x000034da in ns_dyn_mem_alloc (alloc_size=65462)
at ./mbed-os/features/FEATURE_COMMON_PAL/nanostack-libservice/source/nsdynmemLIB/nsdynmemLIB.c:310
#6 0x00034d96 in mac_mlme_data_base_allocate ()
#7 0x0001b3ea in ns_sw_mac_create ()
#8 0x00006b7c in nd_tasklet_network_init (device_id=0 '\000')
at ./mbed-os/features/nanostack/FEATURE_NANOSTACK/mbed-mesh-api/source/nd_tasklet.c:431
#9 0x00005f68 in LoWPANNDInterface::init (this=0x20007954 <mesh>)
at ./mbed-os/features/nanostack/FEATURE_NANOSTACK/mbed-mesh-api/source/LoWPANNDInterface.cpp:67
#10 0x00005ea6 in LoWPANNDInterface::connect (this=0x20007954 <mesh>)
at ./mbed-os/features/nanostack/FEATURE_NANOSTACK/mbed-mesh-api/source/LoWPANNDInterface.cpp:27
#11 0x000156aa in main (argc=0, argv=0x0 <osRegisterForOsEvents>) at ./source/main.cpp:34
(gdb)
Which looks a tad like the stack is trying to allocate 65K to ndp, off to work out how / where to configure that.
cc @SeppoTakalo @stevew817 @akselsm
Seems as if in both this issue and in issue #303 for mbed-os-example-client the problem is due to calling platform_enter_critical()
in interrupt context.
Unfortunately, the two call stacks which lead to calling platform_enter_critical()
in interrupt context are completely different (seems even that they have nothing in common until platform_enter_critical()
) . The call stack trace for issue #303 can be found here.
So as a follow up to that issue, have poorly hacked the driver to pass signals back to a thread and juggled stack sizes and we're all working.
@betzw it just comes back to not being able to use mutexes in ISRs I guess, as a function of mbed-os not supporting nested interrupts, which is fine if documented / not needed / not used.
Needs to be polished / PR'd, but the solution is in sight. It would be super cool if nanostack was ok with being called from an ISR, and also if it didn't crash horribly if you do do it in any mode but debug, but we're on the way up ^_^
Nanostack cannot be OK when called from ISR context as it used mutexes for protection. We don't have interrupt safe mutexes in Mbed OS.
Requirement for using the worker thread is documented in Nanostack's driver API https://docs.mbed.com/docs/arm-ipv66lowpan-stack/en/latest/driver_api/
A fix for this is in the works. I'll PR it when ready.
Getting same error on NUCLEOF767ZI
@danielklioc Please open a new ticket for NUCLEO since this is a platform-specific issue. NUCLEOF767ZI doesn't contain a radio, so it is highly unlikely it is the same root cause.
Description
Calling the
mesh.connect()
method causesMutex 0x2000157c error -6: Not allowed in ISR context
.I'm working on porting mbed-os and getting the network stack working on EFR32FG devices, based on the existing EFR32MG support, but having some troubles with ISRs trying to use the platform critical section mutex in what appears to be the precompiled network stack.
Bug / Question
Target EFR32FG12
Toolchain: GCC_ARM
Toolchain version: arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]
mbed-cli version: 1.2.0
mbed-os version: mbed-os-5.4.0-rc1-2137-g9b01a67
Expected behavior Building with nanostack should hopefully work.
Actual behavior Error in mutex access from ISR in NanostackRfPhyEfr32.cpp caused by underlying call to
platform_enter_critical
embedded in libnanostack.a.Causes error output:
Mutex 0x2000157c error -6: Not allowed in ISR context
when compiled in debug mode, and nasty startup failures if not.Steps to reproduce Build and run ryankurte/node-mbed on an EFR32FG12 target (
make f
to use JLink adaptor as board does not have mbed file system support).Optionally add a breakpoint instruction
__asm__("#BKPT 01")
here to assist with catching the error.Run debug server and gdb with
make ds
andmake d
.Details
Building with the debug profile (
mbed-cli compile --profile mbed-os/tools/profiles/debug.json
)Main looks like:
Which gives the output:
Dropping a n
__asm__("BKPT #X);
into error_msg inrtx_handlers.c
to catch that error gives me the following stack trace.It appears to me that the EFR32 nanostack PHY TX complete callback is calling the nanostack virtual mac, which is then calling the underlying
platform_enter_critical
function, which attempts to grab the critical section mutex and fails because it's from an ISR. Which afaik seems like a sensible thing to do with a tx complete interrupt.As far as I can tell, the issue is kindof a duplicate of mbed-os-example-client#284 mbed-os-example-client#303, #4834 and #4904 but as applied to libnanostack which as a closed source blob we cannot reasonable fix :-/
Any ideas on how to solve this? I am working on an update to 5.5, but it appears this issue exists across mbed-os versions. One of the other issues drops the mutex attempt from the driver, except it's not the driver layer that is attempting it in this case. Other option I have come up with is to implement a radio thread to wrap the interrupt based phy, but it seems like a bit of a bodge.
Also as a sidenote, this error appearing as a hardfault / causing startup failure in non-debug builds is (imo) very difficult to debug, it might be interesting to wrap this error in a more useful manner for other build types as well as debug.
Thanks,
Ryan