PX4 / PX4-Autopilot

PX4 Autopilot Software
https://px4.io
BSD 3-Clause "New" or "Revised" License
7.91k stars 13.25k forks source link

px4-stm32f4discovery_default will dead when run into application #6613

Closed leomatrix closed 7 years ago

leomatrix commented 7 years ago

I git clone the lastest code of px4, and I use make px4-stm32f4discovery_default upload to flash this test firmware to stm32f4-discovery board. I find out that after bootloader the app will dead , the console output show that: up_hardfault: PANIC!!! Hard fault: 40000000 up_assert: Assertion failed at file:armv7-m/up_hardfault.c line: 171 task: Idle Task up_dumpstate: sp: 20000f98 up_dumpstate: IRQ stack: up_dumpstate: base: 20000fe8 up_dumpstate: size: 000002ec up_dumpstate: used: 00000134 up_stackdump: 20000f80: 00000000 20000f90 08010029 20000f98 20002464 0801003d 000000ab 20000b00 up_stackdump: 20000fa0: 00000000 20000fc0 20000fc0 08016731 00000000 20000fc0 08018e03 00000000 up_stackdump: 20000fc0: 00000000 08018e0b 08018de9 08018dd5 00000000 2000230c 00000000 00000000 up_stackdump: 20000fe0: 20002410 08010377 deadbeef 00000000 200009b4 00000001 00010000 00000000 up_dumpstate: sp: 200023e0 up_dumpstate: User stack: up_dumpstate: base: 20002464 up_dumpstate: size: 000001f4 up_dumpstate: used: 00000000 up_stackdump: 200023e0: deadbeef 00000000 20002430 00000000 08023970 00000000 00000002 20000b00 up_stackdump: 20002400: 00000000 00000000 0801ad23 00000000 08023990 00000000 08010dd3 00000000 up_stackdump: 20002420: 08023990 00000000 0801adc1 08023990 00000000 200009b4 2000086c 00000003 up_stackdump: 20002440: 0801041f 08010ae5 0800cc83 20002468 0001db98 00000000 00000000 deadbeef up_stackdump: 20002460: deadbeef deadbeef f30c81b7 73defb50 00000008 80000000 00000190 80000008 up_registerdump: R0: 08023991 20002414 deadbf02 08023991 deadbeef 00000000 00000000 20002410 up_registerdump: R8: 20002418 20002414 00000000 00000000 80000000 200023e0 0801ad23 08010e6e up_registerdump: xPSR: 01000200 BASEPRI: 00000000 CONTROL: 00000000 up_registerdump: EXC_RETURN: ffffffe9 up_taskdump: Idle Task: PID=0 Stack Used=0 of 0

leomatrix commented 7 years ago

I also find that if comment CONFIG_STACK_COLORATION in the defconfig this dead will disappear, I think the problem exist in stm32_start.c ===> go_os_start((FAR void *)&_ebss, CONFIG_IDLETHREAD_STACKSIZE); if I choose os_start directly ,the dead will not appear

leomatrix commented 7 years ago

I attempt to increase and decrease the value of CONFIG_IDLETHREAD_STACKSIZE, but this operation seem not take effect...

davids5 commented 7 years ago

@leomatrix - This is not that surprising the config is stale, and has not been vetted lately. Do not be misled by the CONFIG_STACK_COLORATION - it is doing what it is suppose to. The fault occurring in the code at the instruction prior to 0x0801ad23. You can use gdb to debug this.

leomatrix commented 7 years ago

@davids5 As you said I use arm-none-eabi-gdb to debug step by step ! when I run to go_os_start, The board will dead screenshot at: https://files.gitter.im/PX4/Firmware/kQMh/blob

leomatrix commented 7 years ago

"Set the IDLE stack to the stack coloration value then jump to os_start(). We take extreme care here because were currently executing on this stack." why do you do the operation that Set the IDLE stack to the stack coloration value ???

davids5 commented 7 years ago

@leomatrix Stack coloration is used to check the stack penetration. If you grep the code CONFIG_STACK_COLORATION you will see all the useage. In PX4 top reports the stack usage for each thread. This is is achieved using CONFIG_STACK_COLORATION.

All the boards share the stm32_start.c file you are looking at - this code is working and is not the problem unless there is a configuration error (defconfig) for that build.

In the past I have had issues with gdb from the 9.3. Please update your arm tools to 5.4.1 20160609 and do a clean build and try this again.

leomatrix commented 7 years ago

@davids5 Thank you for your help! I have change my arm-none-eabi- tools to 5.4.1 20160609, and make clean and make distclean, But using "make px4-stm32f4discovery_default upload" flash to stm32f4-discovery still dead... could you have a try ???

dcabecinhas commented 7 years ago

I've been doing some digging and I think I understand part of the problem now.

1) The CONFIG_STACK_COLORATION is writing 0xdeadbeef all over the .bss segment when the firmware starts. The .bss segment ranges 0x20000500-0x20002254.

2) The global variable g_root_node lives in .bss at 0x20000fe8 and also gets the value 0xdeadbeef.

3) Setting a watchpoint on g_root_node shows that the first use (read or write) after the coloration is in an inode_search() [file fs_inode.c], where g_root_node gets dereferenced and an exception is raised because 0xdeadbeef is not a valid memory address.

The problem seems to be that g_root_node never gets initialized. Why that would happen with coloring on and not coloring off (according to @leomatrix) seems strange.

@davids5 Any ideas where to start looking next?

davids5 commented 7 years ago

.bss is set to 0 here

Here is where go_os_start uses the CONFIG_IDLETHREAD_STACKSIZE=500 to write and increment from the end of .bss up to the Top of the IDLETHREAD stack.

So that should not overwrite what is below it.

The next possibility is a heap corruption - then when a task is created it's stack is placed in the address range of bss.

Find CONFIG_STACK_COLORATION and set break point on the setting code and just look at g_root_node before each call,

Also compare the defconfig file to that of px4fmu-v1

dcabecinhas commented 7 years ago

There is definitely some memory corruption going on by I am having troubles pinpointing the source.

I was confused before with the stack coloring. The IDLETHREAD_STACK (colored with 0xdeadbeef) gets placed after .bss, as it should.

The INTERRUPT_STACK (also colored with 0xdeadbeef) lives in the middle of .bss, between g_intstackalloc and g_intstackbase. The memory regions for there global variables are

g_intstackalloc  0x20000d00
g_intstackbase   0x20000fe8

All words from 0x20000d00 to 0x20000fe8 (inclusive) get written with 0xdeadbeef in up_color_intstack().

The problem is that 0x20000fe8 is also shared with g_root_inode and so should not be overwritten, or, at least, g_root_inode should shift by 4 bytes.

Breaking after up_color_intstack() finishes and setting g_root_inode=0 lets me proceed to boot to a (non-functional) console.

I see the registration for the ACM driver and then mount fails with no such file or directory. This will be easier to debug once the memory corruption is corrected.

@davids5 Do you have any idea of how to fix the memory corruption so that g_root_inode does not get overwritten?

Also, may I suggest in the future to color the idle task stack and the interrupt stack with two different values?

dcabecinhas commented 7 years ago

I compiled a px4fmu-v4 target and there the memory allocations of g_intstackbase and g_root_inode differ by 4 bytes, as they should.

dcabecinhas commented 7 years ago

I think I got to the root cause of the problem and have an idea for a solution.

It seems the problem is an upstream bug in NuttX that manifests in the Firmware code because of the unusual number for CONFIG_ARCH_INTERRUPTSTACK in px4-stm32f4 most px4fmu architectures (CONFIG_ARCH_INTERRUPTSTACK = 750 and 750 % 8 > 3).

I repeat, the bug is present in ALL arm architectures but in some of them we are "lucky" and it gets masked without being noticed.

In up_initialize() the interrupt stack is colored for 748 bytes (750 & ~3) .

However, in up_exception.S only 744 bytes (750 & ~7) are reserved in memory for the interrupt stack.

This leads to a stack overflow that colors also the first variable allocated after g_intstackbase. In the px4-stm32f4 configuration this is the g_root_inode variable resulting in a hardfault when 0xdeadbeef is dereferenced.

Using objdump we can spot the difference between the default configurations for px4-stm32f4 and px4fmu and why px4fmu does not crash.

~/Firmware$ objdump -t build_px4-stm32f4discovery_default/src/firmware/nuttx/firmware_nuttx |grep .bss|sort|grep -v '$d'|c++filt|grep g_root -C 3 -B 2
20000d00 g       .bss   000002e8 g_intstackalloc
20000fe8 g       .bss   00000000 g_intstackbase
20000fe8 g     O .bss   00000004 g_root_inode
20000fec l     O .bss   00000010 g_inode_sem
20000ffc l     O .bss   00000004 g_procfs_entries
20001000 l     O .bss   00000001 g_procfs_entrycount

~/Firmware$ objdump -t build_px4fmu-v4_default/src/firmware/nuttx/firmware_nuttx |grep .bss|sort|grep -v '$d'|c++filt|grep g_root -C 3 -B 3
20006b00 g       .bss   000002e8 g_intstackalloc
20006de8 g       .bss   00000000 g_intstackbase
20006de8 g     O .bss   00000004 .hidden __dso_handle
20006dec g     O .bss   00000004 g_root_inode
20006df0 l     O .bss   00000010 g_inode_sem
20006e00 l     O .bss   00000004 g_procfs_entries
20006e04 l     O .bss   00000001 g_procfs_entrycount

There is an additional symbol between g_intstackbase and g_root_inode that guards g_root_inode from the stack overflow. __dso_handle gets there due to the additional modules in px4fmu. If I add the uorb_tests module to px4-stm32f4 I can replicate this behavior and not crash on boot, as happens with the px4fmu configurations.

The bug arises due to different alignment assumption in up_initialize() (4 bytes) and up_exception.S (8 bytes).

The solution is to match these two in some way. Either align both to 4-bytes or align both to 8-bytes. This change will need to be performed throughout the NuttX codebase at least for all the mismatched ARM architectures.

As a test I simply allocated 8 bytes more of memory in up_exception.S and managed to get to a nsh console.

By the way, #6754 can be closed as a duplicate of this one as it is indeed the same bug.

Edit: Changed objdump quote to reflect a pristine firmware compile.

LorenzMeier commented 7 years ago

@dcabecinhas This is much appreciated, thank you! @davids5 Can propose a fix that gets accepted upstream.

Would you be interested in working with us on more HW platforms, in particular new boards? Your skill set seems like it could greatly benefit the dev community.

davids5 commented 7 years ago

@dcabecinhas Great Sleuthing!

Just some additional background. We have to maintain the 8 byte alignment for floating point alightment on stm32. I will do the PR for upstream nuttx and hot fix in PX4 master.

Thank you for all your efforts in debugging this!

davids5 commented 7 years ago

@dcabecinhas - Are you on skype?

dcabecinhas commented 7 years ago

Great. I'm glad to help.

leomatrix commented 7 years ago

@dcabecinhas @davids5 Thank you for your helps, you are so great...

davids5 commented 7 years ago

@leomatrix @dcabecinhas @LorenzMeier FYI: https://github.com/PX4/Firmware/pull/6784

Just to close the loop on my px4fmu-v3 build the corruption would have hit __dso_handle that is not used because CONFIG_SCHED_ONEXIT was not set.

dcabecinhas commented 7 years ago

@LorenzMeier Sure, I am interested. I've sent you an email.

@davids5 Commit 5a66539b361fa584cfe15596d0c39c4d8505b5a8 is only a partial fix, even if we are caring only about PX4.

We need to match all the .S with the corresponding .c for all architectures and boards.

At a first glance, at least common/up_checkstack.c should also be fixed and stm32f7/stm32_irq.c too since it probably uses the .S in armv7-m (or stm32). We need to determine which platforms use which chip architectures and fix the ones that should be 64 bit aligned.

~/Firmware$ grep -r CONFIG_ARCH_INTERRUPTSTACK NuttX/nuttx/arch/arm/|grep \~[37]|sort
NuttX/nuttx/arch/arm/src/a1x/a1x_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/arm/up_assert.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/arm/up_vectors.S:  .size   g_intstackalloc, (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/arm/src/arm/up_vectors.S:  .skip   ((CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4)
NuttX/nuttx/arch/arm/src/armv6-m/up_assert.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/armv6-m/up_exception.S:    .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/arm/src/armv7-a/arm_assert.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/armv7-a/arm_vectors.S: .size   g_intstackalloc, (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/arm/src/armv7-a/arm_vectors.S: .skip   ((CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4)
NuttX/nuttx/arch/arm/src/armv7-m/gnu/up_exception.S:    .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
NuttX/nuttx/arch/arm/src/armv7-m/gnu/up_lazyexception.S:    .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
NuttX/nuttx/arch/arm/src/armv7-m/up_assert.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/armv7-r/arm_assert.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/armv7-r/arm_vectors.S: .size   g_intstackalloc, (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/arm/src/armv7-r/arm_vectors.S: .skip   ((CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4)
NuttX/nuttx/arch/arm/src/c5471/c5471_vectors.S: .size   g_intstackalloc, (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/arm/src/c5471/c5471_vectors.S: .skip   ((CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4)
NuttX/nuttx/arch/arm/src/common/up_checkstack.c:  return (CONFIG_ARCH_INTERRUPTSTACK & ~3) - up_check_intstack();
NuttX/nuttx/arch/arm/src/common/up_checkstack.c:  return do_stackcheck((uintptr_t)&g_intstackalloc, (CONFIG_ARCH_INTERRUPTSTACK & ~3));
NuttX/nuttx/arch/arm/src/common/up_initialize.c:  for (size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/efm32/efm32_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/imx6/imx_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/kinetis/kinetis_vectors.S: .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
NuttX/nuttx/arch/arm/src/lpc17xx/lpc17_vectors.S:   .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
NuttX/nuttx/arch/arm/src/sam34/sam_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/sam34/sam_vectors.S:   .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
NuttX/nuttx/arch/arm/src/sama5/sam_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/samv7/sam_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/stm32f7/stm32_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/stm32/gnu/stm32_vectors.S: .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
NuttX/nuttx/arch/arm/src/stm32/iar/stm32_vectors.S: .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
NuttX/nuttx/arch/arm/src/stm32l4/stm32l4_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/stm32/stm32_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/arm/src/tiva/tiva_vectors.S:   .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
NuttX/nuttx/arch/arm/src/tms570/tms570_irq.c:  size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);

The non-arm chips are all 32bit aligned and are not affected.

~/Firmware$ grep -r CONFIG_ARCH_INTERRUPTSTACK NuttX/nuttx/arch|grep \~[37]|grep -v arm| sort
NuttX/nuttx/arch/avr/src/avr32/up_dumpstate.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4;
NuttX/nuttx/arch/avr/src/avr32/up_exceptions.S: .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/avr/src/avr/up_checkstack.c:  return (CONFIG_ARCH_INTERRUPTSTACK & ~3) - up_check_intstack();
NuttX/nuttx/arch/avr/src/avr/up_checkstack.c:  return do_stackcheck(start, (CONFIG_ARCH_INTERRUPTSTACK & ~3));
NuttX/nuttx/arch/avr/src/avr/up_checkstack.c:  uintptr_t start = (uintptr_t)g_intstackbase - (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/avr/src/common/up_initialize.c:  for (size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/hc/src/m9s12/m9s12_assert.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4;
NuttX/nuttx/arch/mips/src/mips32/up_dumpstate.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4;
NuttX/nuttx/arch/mips/src/pic32mx/pic32mx-head.S:#  define PIC32MX_INTSTACK_SIZE (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/mips/src/pic32mx/pic32mx-head.S: *    End(+1): _ebss+CONFIG_IDLETHREAD_STACKSIZE+(CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/mips/src/pic32mx/pic32mx-head.S: *    Start:   _ebss+CONFIG_IDLETHREAD_STACKSIZE+(CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/mips/src/pic32mz/pic32mz-head.S:#  define PIC32MZ_INTSTACK_SIZE (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/mips/src/pic32mz/pic32mz-head.S: *    End(+1): _ebss+CONFIG_IDLETHREAD_STACKSIZE+(CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/mips/src/pic32mz/pic32mz-head.S: *    Start:   _ebss+CONFIG_IDLETHREAD_STACKSIZE+(CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/misoc/src/lm32/lm32_dumpstate.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4;
NuttX/nuttx/arch/renesas/src/sh1/sh1_dumpstate.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4;
NuttX/nuttx/arch/renesas/src/sh1/sh1_vector.S:  .size   _g_intstackalloc, (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/renesas/src/sh1/sh1_vector.S:  .skip   ((CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4)
NuttX/nuttx/arch/risc-v/src/common/up_initialize.c:  for (size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/risc-v/src/nr5m100/nr5_head.S:#  define NR5M100_INTSTACK_SIZE (CONFIG_ARCH_INTERRUPTSTACK & ~3)
NuttX/nuttx/arch/risc-v/src/nr5m100/nr5_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
NuttX/nuttx/arch/risc-v/src/rv32im/up_dumpstate.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4;
NuttX/nuttx/arch/x86/src/common/up_assert.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4;
NuttX/nuttx/arch/xtensa/src/common/xtensa_dumpstate.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3) - 4;
dcabecinhas commented 7 years ago

Just an add-on with some simplifications.

The 64-aligned architectures are

~/Firmware$ grep -r \~7 ./NuttX/nuttx/arch|grep ARCH_INTERRUPTSTACK|sort
./NuttX/nuttx/arch/arm/src/armv7-m/gnu/up_exception.S:  .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
./NuttX/nuttx/arch/arm/src/armv7-m/gnu/up_lazyexception.S:  .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
./NuttX/nuttx/arch/arm/src/kinetis/kinetis_vectors.S:   .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
./NuttX/nuttx/arch/arm/src/lpc17xx/lpc17_vectors.S: .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
./NuttX/nuttx/arch/arm/src/sam34/sam_vectors.S: .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
./NuttX/nuttx/arch/arm/src/stm32/gnu/stm32_vectors.S:   .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
./NuttX/nuttx/arch/arm/src/stm32/iar/stm32_vectors.S:   .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)
./NuttX/nuttx/arch/arm/src/tiva/tiva_vectors.S: .skip   (CONFIG_ARCH_INTERRUPTSTACK & ~7)

The C source files with 32-bit assumptions being compiled against the 64-bit aligned assembly S files seem to be the ones return in these grep expressions

~/Firmware$ grep -r \~3 ./NuttX/nuttx/arch/arm/src/armv7-m/|grep ARCH_INTERRUPTSTACK
./NuttX/nuttx/arch/arm/src/armv7-m/up_assert.c:  istacksize = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
~/Firmware$ grep -r \~3 ./NuttX/nuttx/arch/arm/src/kinetis/|grep ARCH_INTERRUPTSTACK
~/Firmware$ grep -r \~3 ./NuttX/nuttx/arch/arm/src/lpc17xx/|grep ARCH_INTERRUPTSTACK
~/Firmware$ grep -r \~3 ./NuttX/nuttx/arch/arm/src/sam34/|grep ARCH_INTERRUPTSTACK
./NuttX/nuttx/arch/arm/src/sam34/sam_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
~/Firmware$ grep -r \~3 ./NuttX/nuttx/arch/arm/src/stm32*|grep ARCH_INTERRUPTSTACK
./NuttX/nuttx/arch/arm/src/stm32/stm32_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
./NuttX/nuttx/arch/arm/src/stm32f7/stm32_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
./NuttX/nuttx/arch/arm/src/stm32l4/stm32l4_irq.c:    size_t intstack_size = (CONFIG_ARCH_INTERRUPTSTACK & ~3);
~/Firmware$ grep -r \~3 ./NuttX/nuttx/arch/arm/src/tiva/|grep ARCH_INTERRUPTSTACK
~/Firmware$

but please double check as I might have missed something.

Keep in mind that until now we were just searching for ARCH_INTERRUPTSTACK. Trying to identify other alignment problems we should look at references to ~7 which are not with ARCH_INTERRUPTSTACK.

~/Firmware$ grep -r \~7 ./NuttX/nuttx/arch|grep S| sort|grep -v ARCH_INTERRUPTSTACK
./NuttX/nuttx/arch/arm/src/armv7-a/addrenv.h:#  define ARCH_KERNEL_STACKSIZE ((CONFIG_ARCH_KERNEL_STACKSIZE + 7) & ~7)
./NuttX/nuttx/arch/arm/src/armv7-a/smp.h:#define SMP_STACK_SIZE       ((CONFIG_SMP_IDLETHREAD_STACKSIZE + 7) & ~7)
./NuttX/nuttx/arch/arm/src/lpc17xx/lpc17_lcd.c:  putreg32(CONFIG_LPC17_LCD_VRAMBASE & ~7, LPC17_LCD_LPBASE);
./NuttX/nuttx/arch/arm/src/lpc17xx/lpc17_lcd.c:  putreg32(CONFIG_LPC17_LCD_VRAMBASE & ~7, LPC17_LCD_UPBASE);
./NuttX/nuttx/arch/arm/src/samv7/sam_usbdevhs.c:#  define USBHS_ALIGN_DOWN(n) ((n) & ~7)
./NuttX/nuttx/arch/arm/src/samv7/sam_usbdevhs.c:#  define USBHS_ALIGN_UP(n)   (((n) + 7) & ~7)

I had a look at the .c files refering these files and there is a number 4 in CONFIG_ARCH_KERNEL_STACKSIZE - 4 which raises suspicions on arm_assert.c.

~/Firmware$ grep -r ARCH_KERNEL_STACKSIZE ./NuttX/nuttx/arch/arm/
./NuttX/nuttx/arch/arm/src/armv7-r/arm_assert.c:      kstackbase = (uint32_t)rtcb->xcp.kstack + CONFIG_ARCH_KERNEL_STACKSIZE - 4;
./NuttX/nuttx/arch/arm/src/armv7-r/arm_assert.c:      _alert("  size: %08x\n", CONFIG_ARCH_KERNEL_STACKSIZE);
./NuttX/nuttx/arch/arm/src/armv7-r/arm_syscall.c:              regs[REG_SP]      = (uint32_t)rtcb->xcp.kstack + ARCH_KERNEL_STACKSIZE;
./NuttX/nuttx/arch/arm/src/armv7-a/arm_assert.c:      kstackbase = (uint32_t)rtcb->xcp.kstack + CONFIG_ARCH_KERNEL_STACKSIZE - 4;
./NuttX/nuttx/arch/arm/src/armv7-a/arm_assert.c:      _alert("  size: %08x\n", CONFIG_ARCH_KERNEL_STACKSIZE);
./NuttX/nuttx/arch/arm/src/armv7-a/arm_syscall.c:              regs[REG_SP]      = (uint32_t)rtcb->xcp.kstack + ARCH_KERNEL_STACKSIZE;
./NuttX/nuttx/arch/arm/src/armv7-a/arm_addrenv_kstack.c:  binfo("tcb=%p stacksize=%u\n", tcb, ARCH_KERNEL_STACKSIZE);
./NuttX/nuttx/arch/arm/src/armv7-a/arm_addrenv_kstack.c:  tcb->xcp.kstack = (FAR uint32_t *)kmm_memalign(8, ARCH_KERNEL_STACKSIZE);
./NuttX/nuttx/arch/arm/src/armv7-a/addrenv.h:#  define ARCH_KERNEL_STACKSIZE ((CONFIG_ARCH_KERNEL_STACKSIZE + 7) & ~7)

The files where the other constants are refered seem ok but it would be better to have someone from the NuttX project have a look and doublecheck.

Edit: Changed one of the greps to stm32* so it also matches stm32f7 and stm32l4.

davids5 commented 7 years ago

@dcabecinhas

Yes we will have to scrutinise this on a case by case basis to see if there are any that are real issues.

I do not think there are because the up_assert.c is just displaying memory after a hard fault and establishing a word alignment for the %08 printing,

In a hard fault situation, what is at the TOS is not what we care about and seeing extra words there is OK. We do care about the what is at the BOS and if it is has been written within to 1,2,3 32bit words of the bottom then it is too small.

The code change I did just assumes the larger truncation on the size for the 8 byte alignment. So we only color the bottom up and stop before the real top or at the top. This is also fine because the stack will be used full-descending and use TOS on the first ISR. So coloring it has not values as we only care about the BOS.

dcabecinhas commented 7 years ago

@davids5 There are still stack overflows in the NuttX code. You can check it out by breaking in up_initialize() and noticing that stackbase is outside the stack and will be colored in the first instruction.

I created PR 269 in the NuttX repo with a better solution. Once a solution is accepted in NuttX you can backport it to the Firmware

davids5 commented 7 years ago

@dcabecinhas - I posted to the outstanding PR on upstream the results of our discussion this AM. Once you get the additional change in I will back port it.

davids5 commented 7 years ago

Merge to upstream NuttX and back ported via https://github.com/PX4/Firmware/commit/7d62aa6a6d9b64977319bbfe66aab23e67884072