arduino / ArduinoCore-mbed

348 stars 202 forks source link

Unable to get Crash information from Giga #958

Closed dansiviter closed 1 month ago

dansiviter commented 1 month ago

I'm trying to debug an issue with the Giga crashing after a number of hours running. Unfortunately, it seems a number of issues are hampering this:

Has anyone got this working? Any help would be apricated.

pennam commented 1 month ago

@dansiviter If I'm not wrong printf should output to Serial1 TX0 RX0.

If you need to override the macros you can add them to https://github.com/arduino/ArduinoCore-mbed/blob/main/variants/GIGA/conf/mbed_app.json#L10

they should be named:

pennam commented 1 month ago

Then you need to rebuild your libmbed.a file ancd you can follow this readme https://github.com/arduino/ArduinoCore-mbed?tab=readme-ov-file#installation

dansiviter commented 1 month ago

Thanks @pennam. I've been fighting this for the last few hours and I just can't get it to build. It's stopping here:

(mbed) dansi:~/Arduino/hardware/arduino-git/mbed$ ./mbed-os-to-arduino GIGA:GIGA

MBED_CLEAN=0
MBED_UPDATE=0
APPLY_PATCHES=0
RESTORE_GDB_INFO=0
LOCAL_REPO=
REMOTE_BRANCH=
MBED_CORE_LOCATION=/mnt/c/Users/DanielSiviter/Documents/Arduino/hardware/arduino-git/mbed

VARIANT=GIGA BOARD=GIGA
Checking for prerequisites... done.
Creating MbedOS Application... done.
Checking out preferred 'mbed-os' version... done.
Setting up Mbed Application...[mbed] Working path "/tmp/mbed-os-program" (program)
[mbed] GIGA now set as default target in program "mbed-os-program"
[mbed] Working path "/tmp/mbed-os-program" (program)
[mbed] GCC_ARM now set as default toolchain in program "mbed-os-program"
 done.
Compiling Mbed Application...[mbed] ERROR: The mbed tools were not found in "/tmp/mbed-os-program".
       You can run "mbed deploy" to install dependencies and tools.

Any pointers to address this?

pennam commented 1 month ago

You are missing some mbed tool in your PATH. After you get this error move in "/tmp/mbed-os-program" directory and type

mbed compile

you should get a more detailed error and probably running mbed deploy in that directory should fix your issue

dansiviter commented 1 month ago

Getting further but still issues:

argument -m/--mcu: GIGA is not a supported MCU. Supported MCUs are:`
ADV_WISE_1510,          ADV_WISE_1570,          ARCH_MAX,
ARCH_PRO,               ARDUINO_NANO33BLE,      ARM_CM3DS_MPS2,
ARM_MPS2_M0,            ARM_MPS2_M0P,           ARM_MPS2_M3,
ARM_MPS2_M4,            ARM_MPS2_M7,            ARM_MUSCA_B1,
ARM_MUSCA_S1,           B_L4S5I_IOT01A,         B_U585I_IOT02A,
CY8CKIT064B0S2_4343W,   CY8CKIT_062S2_43012,    CY8CKIT_062_BLE,
CY8CKIT_062_WIFI_BT,    CY8CPROTO_062S3_4343W,  CY8CPROTO_062_4343W,
CYSBSYSKIT_01,          CYTFM_064B0S2_4343W,    CYW9P62S1_43012EVB_01,
CYW9P62S1_43438EVB_01,  DISCO_F413ZH,           DISCO_F429ZI,
DISCO_F469NI,           DISCO_F746NG,           DISCO_F769NI,
DISCO_H747I,            DISCO_H747I_CM4,        DISCO_H747I_CM7,
DISCO_L072CZ_LRWAN1,    DISCO_L475VG_IOT01A,    DISCO_L476VG,
DISCO_L496AG,           DISCO_L4R9I,            DISCO_L562QE,
DISCO_WB5MMG,           EFM32GG11_STK3701,      EFM32GG_STK3700,
EP_AGORA,               EP_ATLAS,               EV_COG_AD3029LZ,
EV_COG_AD4050LZ,        FF1705_L151CC,          FF_LPC546XX,
FVP_MPS2_M0,            FVP_MPS2_M0P,           FVP_MPS2_M3,
FVP_MPS2_M4,            FVP_MPS2_M7,            GD32_F307VG,
GD32_F450ZI,            GR_LYCHEE,              GR_MANGO,
HEXIWEAR,               K22F,                   K64F,
K66F,                   K82F,                   KL25Z,
KL43Z,                  KL46Z,                  KW41Z,
LPC1114,                LPC1768,                LPC54114,
LPC546XX,               MAX32620FTHR,           MAX32625MBED,
MAX32625PICO,           MAX32630FTHR,           MAX32660EVSYS,
MAX32670EVKIT,          MIMXRT1050_EVK,         MIMXRT1170_EVK,
MOTE_L152RC,            MTS_DRAGONFLY_F411RE,   MTS_DRAGONFLY_F413RH,
MTS_DRAGONFLY_L471QG,   MTS_DRAGONFLY_L496VG,   MTS_MDOT_F411RE,
NRF52840_DK,            NRF52_DK,               NUCLEO_F070RB,
NUCLEO_F072RB,          NUCLEO_F091RC,          NUCLEO_F103RB,
NUCLEO_F207ZG,          NUCLEO_F303K8,          NUCLEO_F303RE,
NUCLEO_F303ZE,          NUCLEO_F401RE,          NUCLEO_F411RE,
NUCLEO_F412ZG,          NUCLEO_F413ZH,          NUCLEO_F429ZI,
NUCLEO_F439ZI,          NUCLEO_F446RE,          NUCLEO_F446ZE,
NUCLEO_F722ZE,          NUCLEO_F746ZG,          NUCLEO_F756ZG,
NUCLEO_F767ZI,          NUCLEO_G031K8,          NUCLEO_G071RB,
NUCLEO_G0B1RE,          NUCLEO_G431KB,          NUCLEO_G431RB,
NUCLEO_G474RE,          NUCLEO_H723ZG,          NUCLEO_H743ZI2,
NUCLEO_H7A3ZI_Q,        NUCLEO_L073RZ,          NUCLEO_L152RE,
NUCLEO_L432KC,          NUCLEO_L433RC_P,        NUCLEO_L452RE_P,
NUCLEO_L476RG,          NUCLEO_L486RG,          NUCLEO_L496ZG,
NUCLEO_L496ZG_P,        NUCLEO_L4R5ZI,          NUCLEO_L4R5ZI_P,
NUCLEO_L552ZE_Q,        NUCLEO_U575ZI_Q,        NUCLEO_WB15CC,
NUCLEO_WB55RG,          NUCLEO_WL55JC,          NUMAKER_IOT_M252,
NUMAKER_IOT_M263A,      NUMAKER_IOT_M467,       NUMAKER_IOT_M487,
NUMAKER_PFM_M453,       NUMAKER_PFM_M487,       NUMAKER_PFM_NANO130,
NUMAKER_PFM_NUC472,     NU_M2354,               PORTENTA_H7_M4,
PORTENTA_H7_M7,         RHOMBIO_L476DMW1K,      RZ_A1H,
S1SBP6A,                S5JS100,                SDP_K1,
SDT32620B,              SDT32625B,              SDT52832B,
SDT64B,                 SFE_ARTEMIS,            SFE_ARTEMIS_ATP,
SFE_ARTEMIS_DK,         SFE_ARTEMIS_MODULE,     SFE_ARTEMIS_NANO,
SFE_ARTEMIS_THING_PLUS, SFE_EDGE,               SFE_EDGE2,
TB_SENSE_12,            TMPM46B,                TMPM4G9,
TMPM4GR,                TMPM4KN,                TMPM4NR,
UHURU_RAVEN,            WIO_3G,                 WIO_BG96,
WIO_EMW3166,            XDOT_L151CC
pennam commented 1 month ago

you need yo apply our patches before build so you need to use the -a flag

./mbed-os-to-arduino -a GIGA:GIGA

dansiviter commented 1 month ago

Thanks @pennam; I'm getting closer. It seems I'm able to build the project. However, in the Arduino IDE compilation now fails due to:

.\Documents\Arduino\hardware\arduino-git\mbed\variants\GIGA/libs/libmbed.a(except.o): In function `Fault_Handler_Continue2':
except.S:(.text+0x60): undefined reference to `__CRASH_DATA_RAM_START__'

I'm using a lightly modified version of the mBed Crash Reporting Example in the Arduino IDE. There is mention of this in the mBed docs but I'm not sure on how to use this. Again, any help much apricated.

dansiviter commented 1 month ago

I'm down the rabbit hole now... I've modified the mbed-to-arduino script to inject a .crash_data_ram section into linker_script.ld (as per this). This now compiles, but when an exception is thrown, it just seems to hang and and doesn't crash. This is also mentioned in the issue, so I increased the rtos.main-thread-stack-size by 0x100 (256) but still just hangs.

dansiviter commented 1 month ago

I've dropped platform.error-hist-enabled and it's now crashing (i.e. red flashing LED). However, it's still not outputting the error and still not restarting. I'm monitoring both Serial and Serial1 but I only get:

This is the crash reporting Mbed OS example
1st run: Inject the fault exception

Any ideas what could be going wrong?

FYI Looks like #908 is related and codebase is ARMmbed/mbed-os-example-crash-reporting.

dansiviter commented 1 month ago

Can anyone assist?

schnoberts1 commented 1 month ago

Yeah, it doesn't seem obvious does it @dansiviter . I have the same issue:

andy@beast:~/code/dbc/main/3rdparty/ArduinoCore-mbed$ git diff variants/GIGA/conf/mbed_app.json
diff --git a/variants/GIGA/conf/mbed_app.json b/variants/GIGA/conf/mbed_app.json
index 0fb7e922..dcd2b403 100644
--- a/variants/GIGA/conf/mbed_app.json
+++ b/variants/GIGA/conf/mbed_app.json
@@ -8,6 +8,10 @@
       "platform.callback-nontrivial": true,
       "platform.all-stats-enabled": true,
       "platform.memory-tracing-enabled": true,
+      "platform.crash-capture-enabled": true,
+      "platform.error-hist-enabled": true,
+      "platform.fatal-error-auto-reboot-enabled": true,
+      "platform.error-reboot-max": 2,
       "rtos.main-thread-stack-size": 32768,
       "cordio.max-connections": 5,
       "target.mbed_app_start": "0x8040000",
andy@beast:~/code/dbc/main/3rdparty/ArduinoCore-mbed$ git diff mbed-os-to-arduino
diff --git a/mbed-os-to-arduino b/mbed-os-to-arduino
index ef911b5e..3371be06 100755
--- a/mbed-os-to-arduino
+++ b/mbed-os-to-arduino
@@ -229,7 +229,7 @@ generate_flags () {
                        sed -i 's/LENGTH = 0x200000/LENGTH = CM4_BINARY_END - CM4_BINARY_START/g' "$ARDUINOVARIANT"/linker_script.ld
                        sed -i 's/LENGTH = 0x1c0000/LENGTH = CM4_BINARY_START - 0x8040000/g' "$ARDUINOVARIANT"/linker_script.ld
                fi
-               if [[ $ARDUINOVARIANT == *NANO_RP2040* ]]; then
+       if [[ $ARDUINOVARIANT == *NANO_RP2040* ]]; then
                        set +e
                        HAS_2NDSTAGE_SECTION=`grep second_stage_ota "$ARDUINOVARIANT"/linker_script.ld`
                        set -e
@@ -242,6 +242,22 @@ generate_flags () {
                        fi
                fi
        done
+                       if [[ $ARDUINOVARIANT == *GIGA* ]]; then
+      CRASH_SECTION=".crash_data_ram : \n \
+    { \n \
+        . = ALIGN(8); \n \
+        __CRASH_DATA_RAM__ = .; \n \
+        __CRASH_DATA_RAM_START__ = .; \n \
+        KEEP(*(.keep.crash_data_ram)) \n \
+        *(.m_crash_data_ram) \n \
+        . += 0x100; \n \
+        . = ALIGN(8); \n \
+        __CRASH_DATA_RAM_END__ = .; \n \
+    } > RAM \n"
+    sed -i "s/_sidata = .;/_sidata = .;\n${CRASH_SECTION}/"  "$ARDUINOVARIANT"/linker_script.ld
+    echo PATCH CRASH
+    cat "$ARDUINOVARIANT"/linker_script.ld
+    fi
        echo " done."
 }
andy@beast:~/code/dbc/main/3rdparty/ArduinoCore-mbed$ git diff variants/GIGA/linker_script.ld
diff --git a/variants/GIGA/linker_script.ld b/variants/GIGA/linker_script.ld
index 8941b72a..e38c8a2b 100644
--- a/variants/GIGA/linker_script.ld
+++ b/variants/GIGA/linker_script.ld
@@ -49,6 +49,18 @@ SECTIONS
     __exidx_end = .;
     __etext = .;
     _sidata = .;
+.crash_data_ram :
+     {
+         . = ALIGN(8);
+         __CRASH_DATA_RAM__ = .;
+         __CRASH_DATA_RAM_START__ = .;
+         KEEP(*(.keep.crash_data_ram))
+         *(.m_crash_data_ram)
+         . += 0x100;
+         . = ALIGN(8);
+         __CRASH_DATA_RAM_END__ = .;
+     } > RAM
+
     .data : AT (__etext)
     {
         __data_start__ = .;

... and I see the same issue, doesn't reboot. Hangs.

No doubt there's something I'm missing here.

dansiviter commented 1 month ago

@schnoberts1 I was just writing a response! :D

I had a bit of assistance from over on the Arduino Forum. Two things that will hopefully help others:

MBED_NORETURN void mbed_die() {
  // flash LEDs
  NVIC_SystemReset();
}
dansiviter commented 1 month ago

*sigh* After a bit of digging does state:

If application implementation needs to receive this callback, mbed_error_reboot_callback function should be overridden with custom implementation. By default it's defined as a WEAK function in mbed_error.c.

And one assumed that setting MBED_CONF_PLATFORM_FATAL_ERROR_AUTO_REBOOT_ENABLED=1 would mean it rebooted... but alas, it just means it will call the callback and the default implementation does nothing: /ARMmbed/mbed-os/blob/master/platform/source/mbed_error.c#L222C2-L225C2

Not sure why there is both build config and implementation required to reboot. Seems illogical!

schnoberts1 commented 1 month ago

Isn't mbed_error_reboot_callback() called after the reboot not before it? It's invoked by mbed_error_initialize() below it. I don't see how implementing this or not effects the decision to reboot unless there's some code that figures out whether mbed_error_reboot_callback() was redefined.

megacct commented 1 month ago

I believe mbed is doing a soft reset (to maintain memory) so it can collect info for the report. It then sends the report and goes into the _die() loop.

schnoberts1 commented 1 month ago

I think my issue maybe something different anyway. In tracing through the fault handler I saw it prints to Serial. Mine doesn't it just hangs. I tested mbed_error_puts by calling it in my main function and it crashes for me. On investigation, I think it's to do with the fact I've compiled mbed in develop (e.g. NDEBUG is not defined). This has enabled printing to Serial in MBED_ASSERT and what I see is the usb serial write getting a null semaphore because it's in either an exception state or IRQs are masked:

osSemaphoreId_t osSemaphoreNew (uint32_t max_count, uint32_t initial_count, const osSemaphoreAttr_t *attr) {
  osSemaphoreId_t semaphore_id;

  EvrRtxSemaphoreNew(max_count, initial_count, attr);
  if (IsException() || IsIrqMasked()) {
    EvrRtxSemaphoreError(NULL, (int32_t)osErrorISR); <<<-------- fails here
    semaphore_id = NULL;
  } else {
    semaphore_id = __svcSemaphoreNew(max_count, initial_count, attr);
  }
  return semaphore_id;
}

which triggers an assert in here:


void Semaphore::constructor(int32_t count, uint16_t max_count)
{
#if MBED_CONF_RTOS_PRESENT
    osSemaphoreAttr_t attr = { 0 };
    attr.cb_mem = &_obj_mem;
    attr.cb_size = sizeof(_obj_mem);
    _id = osSemaphoreNew(max_count, count, &attr);
    MBED_ASSERT(_id != nullptr); <----- crash
#else
    _count = count;
    _max_count = max_count;
#endif
}

which is going to call mbed_errror_printf which will then call. (eventually) the USB serial write that tried to get a Semaphore and that will fail again, and so on and so on and so on.

As a result none of the fault handler handling is happening since it's just going round in circles.

The bigger question is why the system thinks it's in an exception or masked IRQ state at that point. I suspect this is my real issue.

Happy days.

dansiviter commented 1 month ago

Apologies, you're right. But I find that even more confusing!

A few other interesting issues I've found using the released mbed ArduinoCore:

I thought REDIRECT_STDOUT_TO would be helpful but seems to cause more problems.

schnoberts1 commented 1 month ago

In fact I'm not sure how the mbed error messages can ever work when mbed isn't compiled in release mode. See this:

void mbed_error_puts(const char *str)
{
    // Writing the string to the console in a critical section is
    // potentially beneficial - for example in BufferedSerial it
    // forces the "unbuffered" mode that makes sure all characters
    // go out now. If we made the call not in a critical section,
    // it would go to the software buffer and we would be reliant
    // on platform.stdio-flush-at-exit forcing a fsync before
    // entering mbed_die().
    //
    // But this may be the very first write to the console, and hence
    // require it to be initialized - doing this in a critical
    // section could be problematic. So we prime it outside the
    // critical section with a zero-length write - this forces
    // the initialization.
    //
    // It's still possible that we were in a critical section
    // or interrupt on entry anyway (eg if this is an error coming
    // from inside RTX), so in other areas of the system we suppress
    // things like mutex creation asserts and RTX traps while
    // an error is in progress, so that console initialization
    // may work.
    write(STDERR_FILENO, str, 0);

    core_util_critical_section_enter();

core_util_critical_section_enter masks interrupts (disables them). This means any semaphore construction will fail until the end of that section, which is the end of mbed_error_puts. Surely this means any non-release mode build if mbed just recurses to death when this function is called and usb serial requests a semaphore?

schnoberts1 commented 1 month ago

...and I can confirm that once I use a release mbed build auto-reboot on hard fault starts to work. Now to work out why the error context isn't set :) Worth noting you still can't call mbed_error_puts in a release build because it blows up with a zero Semaphore on the Giga due to it having masked interrupts before the USB driver creates a semaphore.

I think this is one of my chief frustrations with mbed. Nothing seems to work quite right because there's all these undocumented edge cases.

[EDIT] context is now set fine.

megacct commented 1 month ago

Interesting! I've been compiling with the RELEASE profile for ages now so just ran a #DIV/0! test on my setup using the stock libmbed.a.

My mbed_die() still worked as expected and the fault report went to my log but I got more info.


++ MbedOS Fault Handler ++

FaultType: HardFault

Context:
R0   : 00000370
R1   : 2404A900
R2   : E000ED00
R3   : 00070210
R4   : 2400A5B0
R5   : 24009FD8
R6   : 016CAD3A
R7   : 40000C00
R8   : 240017D8
R9   : 00000001
R10  : 00000000
R11  : 016CAD3D
R12  : 08061799
SP   : 24055350
LR   : 08061A17
PC   : 0804C776
xPSR : 21030000
PSP  : 240552E8
MSP  : 2407FF78
CPUID: 411FC271
HFSR : 40000000
MMFSR: 00000000
BFSR : 00000000
UFSR : 00000001
DFSR : 00000000
AFSR : 00000000
Mode : Thread
Priv : Privileged
Stack: PSP

-- MbedOS Fault Handler --

++ MbedOS Error Info ++
Error Status: 0x80FF013D Code: 317 Module: 255
Error Message: Fault exception
Location: 0x804C776
Error Value: 0x24057228
Current Thread: main Id: 0x2404A900 Entry: 0x80629F9 StackSize: 0x8000 StackMem: 0x2404D3A8 SP: 0x24055350 
For more info, visit: https://mbed.com/s/error?error=0x80FF013D&osver=61700&core=0x411FC271&comp=2&ver=90300&tgt=GIGA
-- MbedOS Error Info -- 

I don't normally get the `MbedOS Error Info` detail.
megacct commented 1 month ago

Should this issue be closed and any discussion continued in the forum?

schnoberts1 commented 1 month ago

I'll open a ticket related to the fact it doesn't seem to work at all in develop and debug profiles due to the issue I highlighted.

dansiviter commented 1 month ago

@megacct On the original topic of being able to extract Crash information, yes it can be closed as it is, albeit undocumented, possible via Tx0/Rx0 @ 115,200. I think this thread has highlighted there are all sorts of issues related to the integration between Arduino and mbed APIs that make is very complicated to perform. However, it's all probably moot with the EoL of mbed and move to Zephyr.

megacct commented 1 month ago

@dansiviter - agreed. Having migrated from a mega, all I wanted was more speed and memory. I really didn't need the complexity or API overhead of a RTOS (or a co-processor) but I understand why Arduino went that way rather than bare-metal. Won't be migrating - it's working now and I'm mostly happy