MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.22k stars 19.22k forks source link

[BUG] STM32F1 watchdog is not enabled. Boards freeze with heaters on. #18226

Closed minosg closed 4 years ago

minosg commented 4 years ago

Bug Description

The watchdog initialization function in the HAL for STM32F is commented out, resulting in non operational watchdog. When Marlin crashes on those boards, they stop mid-print with the heaters on. This affects among others the following board:

Steps to Reproduce

  1. Compile latest Marlin release (2.5.3.0 or bugfix release)
  2. Run a high retraction model with low E-Jerk ( Extruder Jerk set to 2) or do anything else that is guaranteed to crash the FW.
  3. Check the nozzle stuck in the model and verify that heaters and fans are running.

Expected behavior:

The watchdog should kick in and reset the board

Actual behavior:

The nozzle bakes some pla cake

Additional Information

The code can be seen at:

watchdog.cpp

I can confirm that on BTT SKR mini e3 the watchdog can be enabled by removing comments from this line

/**
 * @brief  Initialized the independent hardware watchdog.
 *
 * @return No return
 *
 * @details The watchdog clock is 40Khz. We need a 4 seconds interval, so use a /256 preescaler and 625 reload value (counts down to 0)
 */
void watchdog_init() {
  iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
}

I have no knowledge why this line was commented out but it appears it has been so since that particular has been introduced in e9acb63290f068cb9dd75acb3c042bc6bccd6616

xC0000005 commented 4 years ago

The maple core hard hangs if the watchdog is enabled via option bytes (the ST core does not). That's the root cause of this, but I agree, that should be enabled.

xC0000005 commented 4 years ago

If you have a way to test a change, I have a branch with a fix that reads the option byte and enables it correctly.

minosg commented 4 years ago

I will be testing the change from your branch. I will report back on how it worked on my system but some Stm core users may have to test it as well

thisiskeithb commented 4 years ago

I think this is important enough to keep open for now.

minosg commented 4 years ago

@thisiskeithb thanks. When I'm back to civilization I will check @xC0000005's fork and verify if the changes he is bringing in works in my boards

xC0000005 commented 4 years ago

Note that any board with the option watchdog set still has watchdog. It’s software watchdog that is broken.

Sent from my iPhone

On Jul 12, 2020, at 12:58 PM, minosg notifications@github.com wrote:

 @thisiskeithb thanks. When I'm back to civilization I will check @xC0000005's fork and verify if the changes he is bringing in works in my boards

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

thinkyhead commented 4 years ago

I can confirm that on BTT SKR mini e3 the watchdog can be enabled…

Which hardware permits the use of…

iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);

…and which hardware does not? It sounds like a new flag is needed in pins_MY_BOARD.h or as a -D option on the appropriate targets.

xC0000005 commented 4 years ago

Some background - the old maple core had a bug where if the watchdog was started via option byte, maple hard-hung attempting to enable it. If you read the option bytes on your board, it probably has software watchdog, not hardware. It could be that the core bug was fixed and we never knew. Back when this was disabled, you could set the hardware option byte on a bluepill, compile, and watch it hang, disable it and everything worked. The right fix is at the HAL layer to read the software option byte, but I haven’t had a chance to test it.

On Jul 12, 2020, at 1:07 PM, Scott Lahteine notifications@github.com wrote:

I can confirm that on BTT SKR mini e3 the watchdog can be enabled…

Which hardware permits the use of

iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD); and which hardware does not? It sounds like a new flag is needed in pins_MY_BOARD.h or as a -D option on the appropriate targets.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MarlinFirmware/Marlin/issues/18226#issuecomment-657268945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHVGS4PCLHDRMXAAFW6UVKLR3IJX7ANCNFSM4NYGQEMQ.

minosg commented 4 years ago

@xC0000005 As discussed offline, I enabled the hardware watchdog using option bytes, and and can confirm that the maple core hung as expected. Which means that enabling this line in boards with the hardware watchdog is not an option.

xC0000005 commented 4 years ago

Please try this and see if it behaves correctly: https://github.com/xC0000005/Marlin/commit/6c3ce24513a48d88a2ead07301ff6dbd920f5c77

On Jul 19, 2020, at 4:25 PM, minosg notifications@github.com wrote:

maple

minosg commented 4 years ago

@xC0000005

I may fail to see the scope of the above patch. According to this document the OPTCR and FLASH_OPTCR_WDG_SW register references are for the STM32F2 not the F1

For this platform shouldn't it be written as:

void watchdog_init() {
  if (FLASH_BASE->OBR) & FLASH_OBR_WDG_SW == 0) {
    iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
  }
}

?

And would the idea be that if the hardware watchdog is enabled by option bytes do nothing ( not activate the watchdog) while if it is not, use the software wathdog init?

At any rate I have tested the proposed code which fails to compile on latest bugfix, and my version of the equivalent which did compile but not work. Enabling the option byte still stopped the board from booting up.

minosg commented 4 years ago

Ok coming back after having worked on this issue a bit more something is very off. For this platform you can use the ST's Software or Hardware watchdog, which is enable by Option Bytes set in in the OBR register. The problem which @xC0000005 described occurs when you enable the hardware watchdog which hangs.

Now the fun part. What he proposed was a simple test, to verify if the option byte is not-set,indicative of using a hardware watchdog, and if so do nothing. Or if the option byte is set, using a software watchdog to initialize it using the following line of code:

wdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);

This could be achieved by a test described above as:

  if (((FLASH_BASE)->OBR& (FLASH_OBR_WDG_SW)) != 0) {
    iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
  }

The problem

The problem lies that the above code does not work. And the reason which it does not work is that for at least my platform, the flash reg map is malformed, probably missing one register entry

/** @brief STM32F1 Flash register map type */
typedef struct flash_reg_map {
    __IO uint32 ACR;            /**< Access control register */
    __IO uint32 KEYR;           /**< Key register */
    __IO uint32 OPTKEYR;        /**< OPTKEY register */
    __IO uint32 SR;             /**< Status register */
    __IO uint32 CR;             /**< Control register */
    __IO uint32 AR;             /**< Address register */
    __IO uint32 OBR;            /**< Option byte register */
    __IO uint32 WRPR;           /**< Write protection register */
} flash_reg_map;

#define FLASH_BASE                      ((struct flash_reg_map*)0x40022000)

Since all members are 32bit registers there is no padding involved.

The OptionByte register should be in 0x4002201C location while it in our case it points to 0x40022018

This can be manually verified by using pointed to peek at the register and toggling the nRST_STOP ( it is a safe bit to test with )

volatile uint32_t * obr_shortcut = (volatile uint32_t *)0x4002201C;
volatile uint32_t wgd_val= *obr_shortcut & FLASH_OBR_WDG_SW;
volatile uint32_t_nrst_val= *obr_shortcut & FLASH_OBR_nRST_STOP;

Rewriting the above conditional with the incorrect structre member(which points to the right address) and using the nRST as a check, the logic has been verified to be working .

  if (((FLASH_BASE)->WRPR & (FLASH_OBR_nRST_STOP)) != 0) {
    iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
  }

But trying the same for the watchdog big still crashes maple core

  if (((FLASH_BASE)->WRPR & (FLASH_OBR_WDG_SW)) != 0) {
    iwdg_init(IWDG_PRE_256, STM32F1_WD_RELOAD);
  }

Proposal

I am not fully aware of which platforms come with the option byte and weather that is worth the safety risk of disabling the watchdog the most common platforms we currently have. We could guide the users on how to disable it using ST's software if their new platform crashes on boot-up.

But I am not quite sure if it worth untangling and debugging yet another maple core framework issue, which is intended to be phased out gradually if I am not wrong.

Again though I am a Marlin noob, so I am just sharing my findings.

github-actions[bot] commented 4 years ago

This issue is stale because it has been open 30 days with no activity. Remove stale label / comment or this will be closed in 5 days.

Plasmarobo commented 4 years ago

This seems to effect Creality v4.2.2 boards (Ender 3, Ender 5, Ender 5 Pro).

Any chance of a fix? I really don't want to burn my house down, my printer hung in the middle of a print with both the headed bed and extruder on, when I power cycled it Thermal Runaway was triggered, but too little too late.

I plan to experiment with these settings/hack a fix into the Maple core, but this seems pretty critical to me: not fixing it can and will cause a fire eventually.

sjasonsmith commented 4 years ago

Now that we have fixed a major cause of hangs, I think we need to come back to this and try to get the watchdog working everywhere. Re-opening.

@Plasmarobo please make sure you are using Marlin 2.0.7 (or latest bugfix). Hangs involving serial ports were recently fixed for these boards.

qwewer0 commented 4 years ago

Are there settings or a sequence of tasks that could bring the issue out, for testing?

sjasonsmith commented 4 years ago

@qwewer0 if you want a way to trigger the watchdog, you could modify some GCode to simply disable interrupts then spin in a loop. If Marlin is behaving properly it will never hang on it’s own, so you have to contrive your own hang.

minosg commented 4 years ago

@sjasonsmith I believe the core issue is the enabling the watchdog part not the testing it actually works when enabled.

As @xC0000005 described above you can have this plaftom configured to use a hardware or a software watchdog.

The commented out line will work just fine and enable watchdog in platforms configured for software watchdog, but it will not even boot marlin if you use a hardware wd.

The idea is to create a logic fork which works like that:

sjasonsmith commented 4 years ago

Do any actual boards in use enable the HW Watchdog in the option bytes? When I enable it on my SKR E3 DIP is immediately hangs on boot, it doesn't matter whether or not I am initializing the watchdog through software.

If we are disabling watchdogs for every STM32F1 user just so that one person can run a printer from a Bluepill, then we should just turn it on leave it up to that one Bluepill user to fix it for their board, or to just change their option bits.

If we actually know about real 3D printer boards enabling the HW Watchdog, I'd like to know what they are.

I have checked all the following boards, and none of them enable HW Watchdog in option bits:

The following were inconclusive because I wasn't able to easily connect to them over SWD. Most likely their firmwares have debug pins disabled and without reset buttons I cannot easily catch them in a reboot.

sjasonsmith commented 4 years ago

Personally, I am willing to risk hanging some boards at boot as long as we are not bricking them. Nobody is going to start a fire because their printer won't boot.

minosg commented 4 years ago

@sjasonsmith I am with you on this one. BTT and MKS appear to be the most popular boards, and most likely to attract new users because of the support by youtube tutorials. Those users are not guaranteed to be familiar with a watchdog and it's purpose so the risk is non uniformly distributed.

I tried to find out which issue commented out that line, to follow any past discussions, but I believe that it was inherited as is from Marlin4ST.

Is it possible to enable the watchdog and then add a health-check to display a Platform Specific warning, letting users know that if their board comes with the hardware watchdog, do xyz?

Plasmarobo commented 4 years ago

I can try to connect to my Creality 4.2.2 via SWD, think I have a hw debugger lying around someplace.

I will say I built a version of bugfix with the proposed changes to watchdog.cpp (the if-statement that reads the option bytes). the USE_WATCHDOG flag is set in my config, but I'm not experienced enough to know if it's actually using the hardware watchdog. Could be that codepath is just never being hit.

However it doesn't hang on boot, and I'm currently testing it out to see if it'll print benchy.

xC0000005 commented 4 years ago

The M200 and Lerdge boot loaders enable the boot loader even if the option byte doesn’t, so perhaps what is needed is a board-specific #define that enables only those that need it to skip init if so.

sjasonsmith commented 4 years ago

@xC0000005 are there Lerdge boards using this HAL? I thought they all used HAL/STM32.

I don’t have an F1-based Malyan board to play with. The one I have appears to be dead.

rhapsodyv commented 4 years ago

The M200 and Lerdge boot loaders enable the boot loader even if the option byte doesn’t, so perhaps what is needed is a board-specific #define that enables only those that need it to skip init if so.

But we already have a user option to enable/disable watchdog, right? The board pins file could undef it, and warn the user.

#define USE_WATCHDOG

sjasonsmith commented 4 years ago

But we already have a user option to enable/disable watchdog, right? The board pins file could undef it, and warn the user.

I believe for the boards in question we only need to disable initializing the watchdog. We still need to service it. Someone please correct me if I have misunderstood.

xC0000005 commented 4 years ago

That’s correct - the failure in both Arduino watchdog libraries for STM32 is that they’re written correctly and wait for the watchdog to enter an init/clear state (which it never will if it’s running). (STM32GENERIC doesn’t have this problem, but hopefully nothing still uses it). So it’s init, not service, which hangs.

The line in the watchdog is commented out because of my M200s hanging, and we even discussed at the time how this wasn’t ideal. I believe Victor’s plan was to change toe core to not hang, and mine was to detect if it was enabled already, but neither ever got done.

On Oct 9, 2020, at 10:30 AM, Jason Smith notifications@github.com wrote:

But we already have a user option to enable/disable watchdog, right? The board pins file could undef it, and warn the user.

I believe for the boards in question we only need to disable initializing the watchdog. We still need to service it. Someone please correct me if I have misunderstood.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MarlinFirmware/Marlin/issues/18226#issuecomment-706310004, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHVGS4OUDF3LRVUSH3QNM33SJ5CFBANCNFSM4NYGQEMQ.

rhapsodyv commented 4 years ago

I took the @xC0000005 suggestion and created a PR adding an option to disable the init, by the board: #19693

sjasonsmith commented 4 years ago

I think we should wait to close this until some users try it and verify that it works on a few different boards. If you have verified that the watchdog is now working for you, please reply with the hardware you tested with.

sjasonsmith commented 4 years ago

I just tested and verified the watchdog works on a BTT SKR E3 DIP.

I posted a PR to make watchdog testing easier. If you add the change from #19697, you can enable MARLIN_DEV_MODE and simply send a D100 command through serial. It will either reboot in about 5 seconds, or print an error after ~10-15 seconds.

qwewer0 commented 4 years ago

Without https://github.com/MarlinFirmware/Marlin/pull/19693 changes, for (https://github.com/MarlinFirmware/Marlin/pull/19697) D100, the printer hangs, then I get FAILURE: Watchdog did not trigger board reset., without reboot, and with https://github.com/MarlinFirmware/Marlin/pull/19693 changes, after D100 the printer hangs for a few sec, then reboots, so it looks good.

SKR Mini E3 v1.2

sjasonsmith commented 4 years ago

We've seen it working on a few different MKS and BTT boards now. I am going to close this bug.

If anyone encounter issues related to the watchdog on an STM32F1 boards, please report a new issue specific to your board.

github-actions[bot] commented 3 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.