Stuck in Safe Mode after Battery depletion

ATMakersBill commented 4 years ago

While I'm not to trying to solve a solar-specific problem here, I have hit a problem on a solar project that seems like it would be an issue in other settings.

When my nrf52840 Sense feather's battery drains down, CP goes into Safe Mode. In that mode it doesn't draw much power, so the battery never completely dies (I mean it will but very slowly).

While in this mode, if the power is restored (i.e. the sun comes up & starts charging the battery), the feather never resets... even though there is enough power (the cause for Safe Mode), it will never recover.

I don't know enough about Safe Mode... is there any code running? Is there a chance to put a config setting where if SafeMode is activated by a brownout (vs. other issues) it continue to watch for power to come back to a reasonable level and if so reset the board? In that scenario one of two things would happen:

1) Power is restored... reboot occurs and life is good 2) The battery truly dies... then, power is restored, reboot occurs, and life is good.

@dhalbert suggested just not going into Safe Mode on brownout... I'm not sure what happens then... but I'm sure he'll explain it.

Bill

dhalbert commented 4 years ago

Thinking about this, perhaps we can add some hysteresis to the process, and not go into safe mode on a brownout. Instead, if we get a brownout interrupt, we should wait for some period of time and check the voltage again. If the voltage is normal (3.3v) we can pretend that it was like a hard reset. If the voltage is still low, then we can loop and check again after some period of time. Eventually either the battery will drain completely, or the power will be restored to normal.

This mode might be selectable. There are two use cases I can think of which want to detect brownouts in different ways:

@ATMakersBill's example of a discharging battery which will become charged again.
Some external device, like a motor, is drawing too much power, causing a brownout. In that case the scenario above (waiting to recover full voltage and then restarting normally) is not a good choice. The program will run again, the motor will draw too much power, and then the brownout will happen again. This could cause physical damage eventually to something (e.g. if there was a short somewhere).

I think our original idea of safe-mode brownout was to handle cases like over-current. We had not thought a lot about sagging and recharging batteries. It's two different plausible scenarios.

tannewt commented 4 years ago

I don't know enough about Safe Mode... is there any code running? Is there a chance to put a config setting where if SafeMode is activated by a brownout (vs. other issues) it continue to watch for power to come back to a reasonable level and if so reset the board? In that scenario one of two things would happen:

Safe mode runs all of the CircuitPython supervisor to give USB access to the filesystem but it doesn't run boot.py or code.py because it assumes something in the user code is fatal to the system.

Thinking about this, perhaps we can add some hysteresis to the process, and not go into safe mode on a brownout. Instead, if we get a brownout interrupt, we should wait for some period of time and check the voltage again. If the voltage is normal (3.3v) we can pretend that it was like a hard reset. If the voltage is still low, then we can loop and check again after some period of time. Eventually either the battery will drain completely, or the power will be restored to normal.

This mode might be selectable. There are two use cases I can think of which want to detect brownouts in different ways:
1. @ATMakersBill's example of a discharging battery which will become charged again.

2. Some external device, like a motor, is drawing too much power, causing a brownout. In that case the scenario above (waiting to recover full voltage and then restarting normally) is not a good choice. The program will run again, the motor will draw too much power, and then the brownout will happen again. This could cause physical damage eventually to something (e.g. if there was a short somewhere).
I think our original idea of safe-mode brownout was to handle cases like over-current. We had not thought a lot about sagging and recharging batteries. It's two different plausible scenarios.

I'm not sure it's the job of the micro to monitor it's own power. I know @ladyada just pointed out the UM803 power management IC for use with the imx rt whose sole job is to a hold a micro in reset until power is adequate.

The other thing to consider is the different implementations of brown out detection. The SAMDs detect the brown out as a reset source. So by this time, the chip had enough trouble with power that it was reset. I implemented this because I was accidentally setting all NeoTrellis pixels to full bright and dipping the power. Without safe mode it is impossible to recover from this without wiping the whole filesystem.

The nRF currently takes a different approach to the brownout by having an interrupt triggering the reset. Brownout isn't a reset reason we can read on startup. The warning level is configurable but the reset level is a fixed 1.7v. (Based on 5.3.1.6) Ideally we'd set the reset value to our own value and simply start up as normal once above that threshold.

The final wrinkle I can think of is the SPI flash. Although parts come in both 1.8v and 3.3v versions. I believe we always use the 3.3v versions. So even if the nRF is fine below 3.3v, the flash won't be. This is an argument for configurable reset or an external UM803 which is itself configurable.

My feeling is that a solar + battery powered circuit shouldn't attempt to start the MCU and flash up until it is outside the brownout range.

ATMakersBill commented 4 years ago

@tannewt I'm not sure I'm asking for that kind of support - it's not that I'm asking the system to monitor power etc. I'm just asking it not to go into a zombie mode that never resets. And I'm asking for it as a configurable option.

I think the simplest implementation of this request is to have a mode (configurable in boot.py) that says "On brownout, shut down all FLASH access and things that can corrupt filesystems, etc. and then periodically check to see if the brownout is over. If power has been good for a reasonable time, trigger reset."

It's possible that "check to see if the brownout is over" is not possible. In that case, I'd have the code wait a period of time (30 seconds?) and then just reset. If the power is still low, it will boot into CP, trigger the brownout, wait 30 seconds and reset again. Perhaps a solid check in the boot process to do any power checks that are available before enabling the FLASH would be good (seems that they'd be good anyway).

The problem is that as it is, CP MUST be manually reset after a battery gets discharged and then recharged (or power plugged in). This is not a solar issue - it's just my setting. Yes, it will eventually reset, but with a large battery (I'm using a 2500 mAH) and the device in safe mode, that will take a LONG time to draw down from the brownout state to the powered off state... and by then, power will have long been restored.

Are we on the same page?

tannewt commented 4 years ago

I think we're mostly on the same page. @dhalbert has a pending PR to the bootloader to validate the power on start as you suggest: https://github.com/adafruit/uf2-samdx1/pull/111/files#diff-803c5170888b8642f2a97e5e9423d399R181

I don't think we need a configuration setting for this though because everyone should want it. It makes no sense to start when power is unreliable.

The only other bit is to ensure that the brown out doesn't lead to a safe mode on start up. We could do this by writing a sentinel in RAM which will get wiped when power dips or by tracking the reset time in the backup domain and only safe moding for short blips.

maholli commented 4 years ago

I wanted to chime in and say we've encountered @ATMakersBill's failure mode countless times with students building solar and battery powered projects.

Maybe I can frame it in a different light:

regardless how you get there, recovering from safe mode requires user intervention

I think we need an ability to dictate safe mode behavior without hard-coding temporary fixes into main.c (for example).

tannewt commented 4 years ago

@maholli That is a good way to put it! I just filed #2795 and #2796 related to more low power work. The latter also needs a way to provide a start reason to user code. That could help the user's safe mode code too.

ita1024 commented 4 years ago

My Trinkets are not even on battery but are requiring too much manual resets, and adding more hardware starts to look expensive. Given that there is no quick fix/option in CircuitPython yet, I am looking into a workaround in the C code.

In my view, it would be ideal to exit the safe mode after for example 2 minutes. In the function wait_for_safe_mode_reset (supervisor/shared/safe_mode.c), would it be fine to call reset_cpu(); after a few ticks or would that reset the device into safe mode again?

Alternatively, would an immediate CPU reset be valid for my cases (never enter "safe mode")?

diff --git a/supervisor/shared/safe_mode.c b/supervisor/shared/safe_mode.c
index a167ab392..5a8ebd2d5 100644
--- a/supervisor/shared/safe_mode.c
+++ b/supervisor/shared/safe_mode.c
@@ -83,14 +83,14 @@ void safe_mode_on_next_reset(safe_mode_t reason) {

 // Don't inline this so it's easy to break on it from GDB.
 void __attribute__((noinline,)) reset_into_safe_mode(safe_mode_t reason) {
-    if (current_safe_mode > BROWNOUT && reason > BROWNOUT) {
-        while (true) {
-            // This very bad because it means running in safe mode didn't save us. Only ignore brownout
-            // because it may be due to a switch bouncing.
-        }
-    }
-
-    safe_mode_on_next_reset(reason);
+    //if (current_safe_mode > BROWNOUT && reason > BROWNOUT) {
+    //    while (true) {
+    //        // This very bad because it means running in safe mode didn't save us. Only ignore brownout
+    //        // because it may be due to a switch bouncing.
+    //    }
+    //}
+    //
+    //safe_mode_on_next_reset(reason);
     reset_cpu();
 }

tannewt commented 4 years ago

@dhalbert Can we close this? Didn't your bootloader changes fix this?

dhalbert commented 4 years ago

@dhalbert Can we close this? Didn't your bootloader changes fix this?

The bootloader fixes were only for the SAMD bootloader, and wasn't mean to cover this case, just the case where low-voltage running causes spurious flash write.s

If the power sags and then returns (the weak battery case)you'll still go into safe mode.

I did a little bit of experimentation added microcontroller.on_brownout(runmode), so that on brownout you can go into RunMode.SAFE_MODE (the default) or RunMode.NORMAL. I was storing the state of that in RAM, but my experimentation shows that it's still too easy to get stuck in safe mode when the power comes back up, because RAM can get wiped. I think the proper solution is to store the state of microcontroller.on_brownout() in flash.

tannewt commented 4 years ago

@dhalbert The way you fixed it though ensures that power is 3.3v or above right? Maybe all of our bootloaders should ensure that.

dhalbert commented 4 years ago

No, the bootloader just ensures that the power is above the brownout detection voltage, which is 2.7V. 2.8V is the maximum detection voltage on nRF52. 3.3V is probably too high, since it limits the battery life, and also the voltage after regulation may be lower.

Even ensuring a high voltage doesn't necessarily help. For instance, once the program starts up, it may start up devices that draw significant current (such as a wifi adapter or LEDs), and those could cause the voltage to sag. So always avoiding safe mode always, if requested, is the right thing to do. A program can monitor the voltage and decide to wait for a higher voltage, reset and try again, etc.

tannewt commented 4 years ago

Whose responsibility is it to make sure the external SPI flash voltage is high enough? The ability of the CPU to run may not match the voltage requirements of external chips on the board.

deshipu commented 4 years ago

I suppose the minimal voltage could be configured in the bootloader's configuration, together with the display stuff. Then it can be different per board, depending on what components are built in on it.

dhalbert commented 4 years ago

The SPI flash chips generally have a minimum operating voltage of 2.7V.

The nRF52840 has a forced reset when VDD is below 1.7V. It has a comparator that can generate an interrupt when VDD is below a set value. The maximum such value is 2.8V.

The SAMD51 can set the BOD33 brownout level up to about 3V. The SAMD21 can be even a bit higher.

The main issue, as I've mentioned, is that the battery voltage can be satisfactory at a light load to pass the voltage requirements, either in the bootloader or in the CircuitPython. But once the program starts running, the battery voltage may dip due to increased load. Right now this triggers brownout protection and a safe mode reset. Once the board is in safe mode, the program does not run, and the board is stuck in safe mode while the battery continues to get charged, say by solar power. So the board can't exit safe mode, and nothing happens. This is the primary problem.

If instead, the board simply reset into normal mode, then the program could run. In the worst case, the voltage would dip and the reset cycle would repeat over and over. If the charging rate exceeds the consumption rate, eventually the reset cycle would stop eventually, and the program could run. A better approach would be for the program to check the voltage periodically and simply wait for a high-enough battery voltage before turning on devices that increase the current consumption.

deshipu commented 4 years ago

I'm not sure if this helps in anything, but I have described my struggles with safe mode and power dipping here: https://hackaday.io/project/158981-kubik-m0/log/180416-safe-mode-problems

Any ideas would be appreciated.

ita1024 commented 4 years ago

@deshipu The safe.c workaround mentioned above works well for me so far.

deshipu commented 4 years ago

@ita1024 I would rather change the BOD33 level in port.c to something lower, and maybe enable the hysteresis. Especially since my boards don't have any other component with higher voltage requirements. However, I would like to publish my projects at some point, and have them added to the CircuitPython's repository, and that means I can't simply just hack the firmware.

Perhaps there could be an option for switching the minimal voltage per board?

dhalbert commented 4 years ago

@deshipu I put code in the UF2 bootloader to wait for 100ms after reaching the BOD33 level (2.7V). But I only did this on the SAMD51. I read your Hackaday post. Is that on SAMD51?

deshipu commented 4 years ago

No, that's SAMD21, sorry for not being specific. I suppose a delay would work in this particular case.

I still think it would be nice to be able to modify the level per board — I could make the PewPews work on battery much longer that way, for example, since they don't use flash.

deshipu commented 4 years ago

@dhalbert I went ahead and created #3130 — let me know what you think.

J-wire commented 4 years ago

Hey everyone,

I am wondering if you guys have a timeline for resolving this issue. I have an M4 express that is getting stuck in safe mode and I am looking for solutions. Any recommendations, including breakout board solutions, would be super helpful.

dhalbert commented 3 years ago

I had another idea for easily signaling that you don't want safe mode on brownout, and that would be to simply add a file to CIRCUITPY that has a distinctive name we can check for. Something like:

SAFEMODE.OFF would turn off all resets into safe mode, including brownout. This might be sufficient.
BROWNOUT_SAFEMODE.OFF would just turn off brownout safe mode, etc.

This filename thing has the advantage of being immediately visible, and easily removable (by loading a CIRCUITPY eraser). It moves any such setting from being buried in the flash to being easily controllable.

A more complicated suggestion is to have a safemode.py that is always run on startup, even when restarting in safe mode, which could examine microcontroller.cpu.reset_reason (a newly added feature) and do a programmatic reset to get out of safe mode, or otherwise disable it in some way.

There are similar such flag files used, for example, in RPi, where you can create a file called ssh on the boot drive, which enables ssh.

tannewt commented 3 years ago

I'd rather not have special files that indicate a setting. boot.py is really for settings.

I'm ok having a safemode.py though. We'd just have to caveat it with a bunch of warnings.

ATMakersBill commented 3 years ago

I like the idea of having a file that is run even when started in safe mode, @dhalbert . That would let me perform a test of the batteries and make a decision based on my actual situation. I also like that it puts the solution in Python rather than having to choose from 2 or three options written in C.

Just to flesh this out, would there be limitations on the code in safemode.py? For example, would the SD card still be read-only? Would it be run before sensors are active or anything like that?

However, I'd love this as a solution, and volunteer to test anything you come up with

Thanks Bill

tannewt commented 3 years ago

I think safemode.py would be the only thing to run. It can reset the micro to escape safe mode.

RabidObeseMan commented 2 years ago

Are these solutions applicable on a circuit board express? I am running into the same issue but am not sure how to implement any of the solutions above :(

dhalbert commented 2 years ago

These solutions would work anywhere, but we have not implemented them yet. They require core changes to CircuitPython.

RabidObeseMan commented 2 years ago

Ah gotcha and I guess there is no current work arounds at the moment?

dhalbert commented 2 years ago

That is right, sorry. You could look into using one of the TPL power switches to force a power cycle, or figure out some other way to hard reset or power-cycle the board.

adafruit / circuitpython

Stuck in Safe Mode after Battery depletion #2694