adafruit / Adafruit_nRF52_Arduino

Adafruit code for the Nordic nRF52 BLE SoC on Arduino
Other
623 stars 497 forks source link

HardFault Exception 100% when debugger enabled at reboot #543

Closed henrygab closed 4 years ago

henrygab commented 4 years ago

I have no idea how to even begin debugging of a HardFault Exception. @hathach ... have you seen this behavior before? do you have any recommendations on how to determine a root cause? Any pointers to tracking down HardFault exception causes would be appreciated.

Describe the bug 100% repro -- having breakpoint enabled causes HardFault Exception at reboot

Set up (mandatory)

To Reproduce Steps to reproduce the behavior:

  1. Compile "Blink" example
  2. Using Ozone, choose Debug-->Start Debugging-->Download and Reset Program
  3. Step through main() function ... everything working wonderfully
  4. Add a breakpoint at loop() (e.g., console command Break.SetOnSrc ("Blink.ino:34");)
  5. Reset program to main() (e.g., press F4 or console command Debug.Reset())
  6. Verify breakpoint still exists, and continue (e.g., press F5) ... expect to hit breakpoint in loop(), but HardFault Exception occurs instead
  7. Stop Debugging from Ozone
  8. Using Ozone, chose Debug-->Start Debugging-->Attach to Running program
  9. Notice that breakpoints are set and work correctly

Expected Results Breakpoints should work across reboots of the board.

Actual Results Breakpoints cannot be set by debugger ... something about the board initialization (post-main()) appears to conflict?

Serial Log There is no output ... perhaps the hardfault exception occurs too early?

hathach commented 4 years ago

I mainly use ozone to troubleshoot as well. I don't remembered the last time I deal with hardfault on nrf52. However update everything to latest, don't forget to pull from submodule (tinyusbcore) as well.

tannewt commented 4 years ago

Can you reproduce this outside of arduino? I've debugged hard faults with GDB but not Ozone. The system control block has helpful registers. I believe you can end up in hard fault if another fault occurs but it's not handled by something else.

See https://static.docs.arm.com/dui0553/b/DUI0553.pdf section 4-3. The NVIC can be helpful too which is section 4-2.

henrygab commented 4 years ago

Hi @tannewt,

Can you help me understand what you mean by "outside of arduino"?

I flash the binary and start debugging using Ozone (w/Segger JLINK EDU).

This same problem has occurred in the past also, with the same steps to repro. (any breakpoint set when restart device == HardFault).

Based on your pointers, I've looked into the registers some, based on a 2019 guide, and the doc you pointed to.

NVIC / SCB register summary (click to expand)

Address | REG | Value | Notes -----------|-------|-----------|----- 0xE000E008 | ACTLR | 0000 0000 | 0xE000ED00 | CPUID | 410F C241 | 0xE000ED04 | ICSR | 0000 0003 | 3 == HardFault Exception 0xE000ED08 | VTOR | 0000 0000 | Vector table offset is zero 0xE000ED0C | AIRCR | FA05 0000 | (boring) 0xE000ED10 | SCR | 0000 0000 | (boring) 0xE000ED14 | CCR | 0000 0200 | Stack was 4-byte aligned on exception entry 0xE000ED18 | SHPR1 | 0000 0000 | All priorities are zero 0xE000ED1C | SHPR2 | 0000 0000 | All priorities are zero 0xE000ED20 | SHPR3 | 0000 0000 | All priorities are zero 0xE000ED24 | SHCRS | 0000 0000 | No exceptions enabled, pending, active (?) 0xE000ED28 | CFSR | 0000 8200 | See next three rows 0xE000ED28 | MMFSR | 00 | No memory fault 0xE000ED29 | BFSRb | 82 | Bus access fault, BFARVALID=1 and PRECISERR=1 0xE000ED2A | UFSRb | 0000 | No user fault 0xE000ED2C | HFSR | 4000 0000 | **FORCED == 1** 0xE000ED34 | MMAR | A801 BE40 | Not valid (not mem fault) 0xE000ED38 | BFAR | A801 BE40 | Address of the precise data access fault 0xE000ED3C | AFSR | 0000 0000 | No additional fault info NVIC shows only interrupt 0 is enabled and active, priority b1110_0000 NVIC shows no interrupts are pending. Stack pointer is to 2003 FF90. Stack was used up to 2003 FF68 (based on ADADADAD data pattern).


Of these all, the `HFSR` register having `FORCED=1` looks promising. Recommendations or pointers to figuring this out would be most welcome, as it's rather a pain to not be able to connect the debugger from the start.
manual register recovery

``` LR&0x4 == 0, so stack found via MSP MSP == 0x2003FFC0 R00 == 00000000 R01 == 2003FFE0 R02 == A801BE2C R03 == 00000010 R12 == 00000030 LR R14 == FFFFFFF9 pc R15 == 00000ACA xPSR == 8100000B ``` Interesting registers: * `MSP` is 0x40 bytes away from the highest RAM address * `R01` is 0x20 bytes away from the highest RAM address * `R14/LR` is `-13` * `R15/PC` is an address that appears to be in page 0 of flash memory(?!)


tannewt commented 4 years ago

Can you help me understand what you mean by "outside of arduino"?

I was thinking a plain C file built using gcc directly. This is my bias because I don't usually use the Arduino toolchain.

This doesn't sound like an issue with the actual code since the error is so early. Did you try powering the board off and unplugging? I've seen issue before where a reset doesn't reset everything.

hathach commented 4 years ago

@henrygab I got into this issue today as well, ozone must change something. I am not entirely sure, I think it is related to bootloader load etc .. https://forum.segger.com/index.php/Thread/6041-SOLVED-Ozone-Program-enters-inmediately-in-hardfault-handler-after-program-and-s/

One work around is using "attach to running" instead of "download and run" option.

henrygab commented 4 years ago

@hathach -- THANK YOU! I had no idea that the bootloader's existing could cause this problem. With that site's help, here is a summary of a solution that works well:

  1. Determine the BootloaderBaseAddress.
click to expand one method...

* For this, I loaded up the "nRF Connect" application, and chose to load the "Programmer" app. * Connect to the device, and then choose "Read" to read all the memory into the left graphical view. * Hover over the different sections, until discover the one named "bootloader". * Note the lower value of the address range it occupies. For my first device, the range shown was `0x000F4000 - 0x000FB797`, so my `BootloaderBaseAddress` is **`0xF4000`**.

  1. Open the .jdebug file in a text editor

  2. Add the following function at the head of the file, ensuring you change the noted line:

    void AfterTargetDownloadOrReset(void) {
    unsigned int SP;
    unsigned int PC;
    unsigned int VectorTableAddr;
    unsigned int BootloaderBaseAddress = 0xF4000; // <<==== EDIT THIS VALUE TO MATCH YOUR BOOTLOADER
    
    VectorTableAddr = Elf.GetBaseAddr();
    
    if (VectorTableAddr == 0xFFFFFFFF) {
    Util.Log("Project file error: failed to get program base");
    } else {
    SP = Target.ReadU32(BootloaderBaseAddress);
    Target.SetReg("SP", SP);
    
    PC = Target.ReadU32(BootloaderBaseAddress + 4);
    Target.SetReg("PC", PC);
    }
    }
  3. Replace the following two functions (which should exist in the file) with the following:

    void AfterTargetReset (void) {
    AfterTargetDownloadOrReset();
    }
    void AfterTargetDownload (void) {
    AfterTargetDownloadOrReset();
    }
  4. Enjoy a stable debugging experience.

hathach commented 4 years ago

great thanks @henrygab for great sum up , ozone certainly changes in recent version probably trying to catch out-of-bound code and make hardfault debugging easier. happy debugging :+1: