RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.9k stars 1.98k forks source link

Hard fault triggered depending on power supply? #4470

Closed sreibs closed 8 years ago

sreibs commented 8 years ago

Hi there,

I run RIOT on a STM32f0 micro and the HARD FAULT is triggered every time I turn the system on, with a particular power supply. For example, if the system is powered from USB it all works fine.

Apparently, voltage fault is not part of the reasons of a HardFault. But it seems this problem does come from a supply problem as it is the only difference between a working and a non working case.

I also removed all my applicative to change with the hello word, and the problem still occurs, see the trace attached.

I checked my voltage (3,3V), for all cases the voltage is clean and flat...

Do you think this problem is a supply problem? Or a soft problem?

Any help will be greatly appreciated !

Thank you


2015-12-12 11:08:08,347 - INFO # kernel_init(): This is RIOT! (Version: 14e5-XXXX) 2015-12-12 11:08:08,351 - INFO # kernel_init(): jumping into first task... 2015-12-12 11:08:08,353 - INFO # UART0 thread started. 2015-12-12 11:08:08,355 - INFO # uart0_init() [OK] 2015-12-12 11:08:08,356 - INFO # Hello World! 2015-12-12 11:08:08,361 - INFO # You are running RIOT on a(n) wattwatcher1 board. 2015-12-12 11:08:08,363 - INFO # This board features a(n) stm32f0 MCU. 2015-12-12 11:08:08,432 - INFO # 2015-12-12 11:08:08,434 - INFO # Context before hardfault: 2015-12-12 11:08:08,436 - INFO # r0: 0x00000001 2015-12-12 11:08:08,437 - INFO # r1: 0x00000001 2015-12-12 11:08:08,439 - INFO # r2: 0x00000002 2015-12-12 11:08:08,440 - INFO # r3: 0x681b2001 2015-12-12 11:08:08,442 - INFO # r12: 0x00000000 2015-12-12 11:08:08,444 - INFO # lr: 0x08000b25 2015-12-12 11:08:08,445 - INFO # pc: 0x08000b1e 2015-12-12 11:08:08,447 - INFO # psr: 0x01000000 2015-12-12 11:08:08,447 - INFO # 2015-12-12 11:08:08,448 - INFO # Misc 2015-12-12 11:08:08,449 - INFO # EXC_RET: 0xfffffffd 2015-12-12 11:08:08,453 - INFO # Attempting to reconstruct state for debugging... 2015-12-12 11:08:08,454 - INFO # In GDB: 2015-12-12 11:08:08,456 - INFO # set $pc=0x8000b1e 2015-12-12 11:08:08,457 - INFO # frame 0 2015-12-12 11:08:08,457 - INFO # bt

DipSwitch commented 8 years ago

Good morning,

What MCU are you using exactly? It seems that the stack pointer is initialized to a faulty address. (PSR register)

The SRAM address starts at 0x0200000

Is the schematic available somewhere?

DipSwitch commented 8 years ago

Sorry psr is not the stack register, but I'm still wondering is there is a schematic available :)

sreibs commented 8 years ago

Hi,

Thank you for quick reply.

The MCU is a STM32F0C8T6.

The board is a home made board. Connection are very simple: 2 SPI for an ADC and a radio transceiver, 4 logic level signals that have been disconnected for debug. The reset is tied high. The interface for programming (SBW) and... that's it.

Do you see a reason why the program works fine with a power supply and not another? Is hardfault can be triggered by a voltage problem or other hardware causes?

sreibs commented 8 years ago

And there is also a 32khz quartz

DipSwitch commented 8 years ago

The first thing that come to mind are the differences in power supply circuit. I assume that the USB power supply has a voltage regulator. How about the power circuit where the external power supply is connected to? Can the external power supply also deliver 500mA? Or 757mA if it's a 3v3 power supply? If you inject the 3v3 after the USB power supply is the USB voltage regulator disconnected from the Vcc? Some regulators don't like it when there is a voltage on the out pin while there is none on the in pin. If the power is injected after the USB power supply does it have capacitors on the line?

Do you have a scope? You can look if the power on the board is nice an clean without to much noise.

You could measure the current the board draws when connected to USB and when it's connected to the external PSU. If there is a big difference there is probably an error in the power circuit.

Do you use / initialize the USB? (if it has any capabilities for usb. I'm on the road =) )

DipSwitch commented 8 years ago

Maybe a brown-out is triggered due to a dip in the voltage.

DipSwitch commented 8 years ago

You could single step trough your code from 'main()' and see where the hard fault occurs maybe when you try to initialize some external peripheral?

sreibs commented 8 years ago

Yes the main voltage is going through a 5V LDO then a 3,3V LDO. Power signal are clean for both power supply (I have a scope).

Current consumption is normal (about 50mA max), for both power supply.

I tried with a "hello world" main (so no ext peripheral initialized neither used) and the problem is still here (see first trace attached).

I first thought was a brown out, but first, I can't see any on the scope (µs resolution) and second, I am not sure it is a cause for HardFault in cortex M0 ?

I observe that PC is almost always at "pc: 0x08000b1e". Could it be a clue?

DipSwitch commented 8 years ago

What you can do is look at the map file. Created in ${RIOT_BASE}/examples/hello-world/bin/${BOARD}/hello-world.map and see what is placed there. When I compile for the board iotlab-m3 for example I see that cpu_init is placed on that address. But for your board it will probably another function ;)

 .text.atomic_cas
            0x0000000008000a48       0x1c /home/dipswitch/RIOT/examples/hello-world/bin/iotlab-m3/cortexm_common.a(atomic_arch.o)
            0x0000000008000a48                atomic_cas
 .text.cpu_init
            0x0000000008000a64       0xbc /home/dipswitch/RIOT/examples/hello-world/bin/iotlab-m3/cpu.a(cpu.o)
            0x0000000008000a64                cpu_init
 .text.lpm_arch_set
            0x0000000008000b20        0x4 /home/dipswitch/RIOT/examples/hello-world/bin/iotlab-m3/cpu.a(lpm_arch.o)
            0x0000000008000b20                lpm_arch_set

And the MCU STM32F0C8T6 is not listed on the STM32F0 page. You're probably missing two characters there STM32F0##C8T6 would make sencse :)

punchcard60 commented 8 years ago

You seem to be missing a digit from the part number. It should be of the form STM32F0xC8T6 where the x is a digit. Is this the case?

DipSwitch commented 8 years ago

And brown-out would possibly trigger an reset indeed and not a hard fault :)

punchcard60 commented 8 years ago

Since the difference shows between two power supplies, I'm wondering if power supply rise time might be putting things into some sort of funky mode? Need the full part number to check into it.

punchcard60 commented 8 years ago

Hard Fault seems to be mostly triggered by flash protection level violation.

DipSwitch commented 8 years ago

And since the return address of the given function the hard fault occurs on is # EXC_RET: 0xfffffffd it is probably an interrupt handler that is hard faulting your system. What peripherals are initialized in your board configuration?

@punchcard60 Or, misaligned memory access (m0 can't access misaligned memory), null pointer referencing, bus faults (although not in this case I guess), access unavailable memory address (like a peripheral base address which is incorrect).

sreibs commented 8 years ago

Sorry about that, MCU is STM32F030C8T6.

I am not sure that the rise time is problem as the begining of the program run normally (you can see the Hello word in the trace and hard fault occured 5ms later.

Thank you for the check, a hard fault cannot come from a brown out.

I'll check the .map file

kaspar030 commented 8 years ago

a hard fault cannot come from a brown out.

But from a misbehaving interrupt caused by a brown out.

sreibs commented 8 years ago

In the helloworld exemple nothing is initialized.

How can I know the interupt the hardfault handler is called from?

Le sam. 12 déc. 2015 13:39, Kaspar Schleiser notifications@github.com a écrit :

a hard fault cannot come from a brown out.

But from a misbehaving interrupt caused by a brown out.

— Reply to this email directly or view it on GitHub https://github.com/RIOT-OS/RIOT/issues/4470#issuecomment-164146167.

punchcard60 commented 8 years ago

@DipSwitch yeah, no requirements for rise time.

DipSwitch commented 8 years ago

This you could see in the map file :) Already found the function?

And that nothing is initialized is not entirely true, the system timer and RTT and maybe transceiver you configured in your board_perh.h do get initialized by the auto_init() function.

kaspar030 commented 8 years ago

How can I know the interupt the hardfault handler is called from?

Check the map where PC points to.

sreibs commented 8 years ago

In the map I have .text.idle_thread 0x08000b18 0x14 /home/seb/EmbeddedArm/emb/RIOT/examples/wattwatcher_app/bin/wattwatcher1/core.a(kernel_init.o) .text.kernel_init 0x08000b2c 0x94 /home/seb/EmbeddedArm/emb/RIOT/examples/wattwatcher_app/bin/wattwatcher1/core.a(kernel_init.o) 0x08000b2c kernel_init .text 0x08000bc0 0x0 /home/seb/EmbeddedArm/emb/RIOT/examples/wattwatcher_app/bin/wattwatcher1/core.a(msg.o)

Sometimes PC is 0x08000956: .text.uart_write_blocking 0x0800094c 0x18 /home/seb/EmbeddedArm/emb/RIOT/examples/wattwatcher_app/bin/wattwatcher1/periph.a(uart.o) 0x0800094c uart_write_blocking .text.uart_poweron 0x08000964 0x28 /home/seb/EmbeddedArm/emb/RIOT/examples/wattwatcher_app/bin/wattwatcher1/periph.a(uart.o) 0x08000964 uart_poweron

Sometimes it is 0x080070FE: .text._printf_i 0x08007028 0x230 /home/seb/EmbeddedArm/gcc-arm-none-eabi-4_9-2015q1/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/lib/armv6-m/libc_nano.a(lib_a-nano-vfprintf_i.o) 0x08007028 _printf_i .text 0x08007258 0x0 /home/seb/EmbeddedArm/gcc-arm-none-eabi-4_9-2015q1/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/lib/armv6-m/libc_nano.a(lib_a-stdio.o) .text.__sread 0x08007258 0x28 /home/seb/EmbeddedArm/gcc-arm-none-eabi-4_9-2015q1/bin/../lib/gcc/arm-none-eabi/4.9.3/../../../../arm-none-eabi/lib/armv6-m/libc_nano.a(lib_a-stdio.o) 0x08007258 __sread

Could it be a problem with printing on the UART? Why would it work with another power supply?

sreibs commented 8 years ago

@DipSwitch I have to go for a couple of hour. I will check for initialization when I get back. Thank you for your help !

Le sam. 12 déc. 2015 à 13:46, Kaspar Schleiser notifications@github.com a écrit :

How can I know the interupt the hardfault handler is called from?

Check the map where PC points to.

— Reply to this email directly or view it on GitHub https://github.com/RIOT-OS/RIOT/issues/4470#issuecomment-164146431.

DipSwitch commented 8 years ago

Are you running from the internal or external crystal? Do you use the PLL? I've also seen this behavior before when you run from 8Mhz with the debugger connected, for some reason the debugger interfere with the MCU, disabling all breakpoints could solves the problem. If the location is random, it could mean that the clock is unstable (which can occur if the external crystal doesn't have the proper capacitors to ground) or when the power is unstable.

If it's always after 5 minutes my first guess would be a timer of some sort though...

DipSwitch commented 8 years ago

And the kernel_init is strange since after Hello World the kernel_init should never be called, unless the MCU resets...

sreibs commented 8 years ago

It is not 5min but 5ms. It actually occured between 5 and 200ms.

If there is a reset it should be visible in the trace, shouldn't it?

I don't have the debugger connected (by the way I've also seen that in other project).

I have an external crystal but only 32k for time counting, it is not the main frequency.

Le sam. 12 déc. 2015 13:55, DipSwitch notifications@github.com a écrit :

Are you running from the internal or external crystal? Do you use the PLL? I've also seen this behavior before when you run from 8Mhz with the debugger connected, for some reason the debugger interfere with the MCU, disabling all breakpoints could solves the problem. If the location is random, it could mean that the clock is unstable (which can occur if the external crystal doesn't have the proper capacitors to ground) or when the power is unstable.

If it's always after 5 minutes my first guess would be a timer of some sort though...

— Reply to this email directly or view it on GitHub https://github.com/RIOT-OS/RIOT/issues/4470#issuecomment-164147588.

sreibs commented 8 years ago

If it is a timing problem (due to a rising time or a dip voltage) would it be possible to reset the MCU on a hardfault rather than freezing it?

Le sam. 12 déc. 2015 14:31, Sebastien Risler sebastien.risler@gmail.com a écrit :

It is not 5min but 5ms. It actually occured between 5 and 200ms.

If there is a reset it should be visible in the trace, shouldn't it?

I don't have the debugger connected (by the way I've also seen that in other project).

I have an external crystal but only 32k for time counting, it is not the main frequency.

Le sam. 12 déc. 2015 13:55, DipSwitch notifications@github.com a écrit :

Are you running from the internal or external crystal? Do you use the PLL? I've also seen this behavior before when you run from 8Mhz with the debugger connected, for some reason the debugger interfere with the MCU, disabling all breakpoints could solves the problem. If the location is random, it could mean that the clock is unstable (which can occur if the external crystal doesn't have the proper capacitors to ground) or when the power is unstable.

If it's always after 5 minutes my first guess would be a timer of some sort though...

— Reply to this email directly or view it on GitHub https://github.com/RIOT-OS/RIOT/issues/4470#issuecomment-164147588.

sreibs commented 8 years ago

I scoped the boot up voltage. The rise time is roughly 350µs. The overshoot is 500mV above 5V during 50µs. The the voltage is clean.

On the 3,3V supply line the rise time is obviously shorter (150µs). The overshoot is 100mV above 3,3V.

Right after the overshoot the line is flat.

I don't see why it would be a brown out as the hardfault occurs 200ms after first MCU output on UART.

sreibs commented 8 years ago

I tested to supply a single MCU with the power supply that cause the problem. The problem is the same. I guess it is not as clean as I can see on the scope. I still don't understand why a "hardfault" is triggered.

Maybe an interrupt is thrown and there is no handler. But this MCU does not even have a Power Voltage Detector. I can't see where the interrupt can come from.

punchcard60 commented 8 years ago

The more I think about it the more I'm sure that @Dipswitch is right. It has to be something about the circuit that controls which supply (USB/Line) powers the MPU. If it was a spike it wouldn't happen on the same address each time unless the spike is from turning on/off some other device on the board.

sreibs commented 8 years ago

I also think so. However I still don't understand why a hardfault is triggered...

The address of PC is not on "kernel_init" but on "idel_thread". The software doing almost nothing, it is likely that the program is almost always in Idle thread... so it makes sense.

If it is something with the supply, I can't understand why the program is able to boot up, write correctly some data on UART (meaning the clock is stable) and suddenly stops.

I posted on ST forum to double check the voltage sensitivity of this MCU.

Any other lead is welcome.

punchcard60 commented 8 years ago

Just to confirm - you're running the hello world example in a totally unmodified copy of RIOT or have there been changes? What is happening with the usb at the time?

On December 12, 2015 6:29:52 PM CST, srisler notifications@github.com wrote:

I also think so. However I still don't understand why a hardfault is triggered...

The address of PC is not on "kernel_init" but on "idel_thread". The software doing almost nothing, it is likely that the program is almost always in Idle thread... so it makes sense.

If it is something with the supply, I can't understand why the program is able to boot up, write correctly some data on UART (meaning the clock is stable) and suddenly stops.

I posted on ST forum to double check the voltage sensitivity of this MCU.

Any other lead is welcome.


Reply to this email directly or view it on GitHub: https://github.com/RIOT-OS/RIOT/issues/4470#issuecomment-164204482

Sent from my Android device with K-9 Mail. Please excuse my brevity.

sreibs commented 8 years ago

Hi,

I actually made some change on RIOT, first because my MCU was not supported neither my board obviously. And I am not on the last release.

I will test to run on the last release today.

Le dim. 13 déc. 2015 03:40, Jon Pattison notifications@github.com a écrit :

Just to confirm - you're running the hello world example in a totally unmodified copy of RIOT or have there been changes? What is happening with the usb at the time?

On December 12, 2015 6:29:52 PM CST, srisler notifications@github.com wrote:

I also think so. However I still don't understand why a hardfault is triggered...

The address of PC is not on "kernel_init" but on "idel_thread". The software doing almost nothing, it is likely that the program is almost always in Idle thread... so it makes sense.

If it is something with the supply, I can't understand why the program is able to boot up, write correctly some data on UART (meaning the clock is stable) and suddenly stops.

I posted on ST forum to double check the voltage sensitivity of this MCU.

Any other lead is welcome.


Reply to this email directly or view it on GitHub: https://github.com/RIOT-OS/RIOT/issues/4470#issuecomment-164204482

Sent from my Android device with K-9 Mail. Please excuse my brevity.

— Reply to this email directly or view it on GitHub https://github.com/RIOT-OS/RIOT/issues/4470#issuecomment-164212865.

sreibs commented 8 years ago

Hi there,

I scoped the NRST signal and managed to see that there are falling edges sync with small spikes on the supply line (80mV <5µs)... Then I assume that it is definitely a hardware problem.

I still don't understand why it ends up to a Hardfault exeption and I am surprised this STM is such sensitive to voltage spike.

I have a thread open on STM forum to clarify this.

Thank you for your time and advices

punchcard60 commented 8 years ago

The datasheet recommends a .1uF capacitor on the NRST pin to help minimize noise. I'm happy that you found the problem. Good Luck!

sreibs commented 8 years ago

Yes I have the capacitor on the NRST. I measured voltage on each VDD and it is more 40mV spike, thanks to the decoupling capas. I find it very very sensitive to be the reason...

sreibs commented 8 years ago

Hi everyone,

The problem came from the board and the way the STM was grounded.

I still don't know why it triggered a hardfault interrupt but it was well an hardware problem and not soft at all.

Thanks to everyone.

Jarnisan commented 5 years ago

Hi everyone,

The problem came from the board and the way the STM was grounded.

I still don't know why it triggered a hardfault interrupt but it was well an hardware problem and not soft at all.

Thanks to everyone.

Can you give more details? I'm having similar problem.