InfiniTimeOrg / InfiniTime

Firmware for Pinetime smartwatch written in C++ and based on FreeRTOS
GNU General Public License v3.0
2.71k stars 926 forks source link

0.8.0-develop: Firmware hangs, no watchdog reset #60

Closed dwagenk closed 4 years ago

dwagenk commented 4 years ago

Pressing the button to put the device to sleep while touching the screen causes the device to lock up. Rebooting it by long-pressing the button doesn't work. I've had to open it and reset it (briefly short the power via the debug connector, usually happens to me when trying to fiddle the debug cables in there).

If you can't reproduce the behavior I'll retry with connected debug probe to try getting more information on what is actually happening.

V_20200902_222020_1

lupyuen commented 4 years ago

I noticed similar behaviour with 0.8.0 too. I called it the "PineTime Defibrillator Syndrome"...

  1. I was charging PineTime on the cradle. It powered on. Pressing the button was OK

  2. While charging, the screen blacked out. Button did not respond. Maybe something came loose and it stopped charging

  3. I tried sticking close the PineTime cover really tight, put it back on the cradle. Still nothing on the screen. Button did not respond

  4. Here's the very spooky thing... I opened the back cover. I tapped the Pogo Pins on the PineTime SWD Port. PineTime came back to life!

  5. The spooky thing: The Pogo Pins were connected to ST-Link. BUT ST-Link was NOT connected to USB. There's no power at all!

  6. I closed up the back cover again, took some pics, button was OK. After that the screen went blank. Maybe the battery drained

  7. I put PineTime back on the cradle, screen still didnt come on. Button did not respond

  8. I opened the back cover, tapped the Pogo Pins on the SWD Port again. It came back to life! Screen was OK, button was OK

So that's the PineTime Defibrillator Syndrome... Somehow it needs something to jolt it back to life

I recall some folks having a similar problem with other firmware... PineTime is charged up but doesn't turn on. Could it be the same issue?

dwagenk commented 4 years ago

Regarding points 4,5,8 at least with my clumsy fingers that behavior is due to briefly shorting the VCC and GND with the debug header when trying to get it in there. So that should trigger a reset due to brown-out detection, or, if that is not set up (don't know, if it needs some configuration on Nordic MCUs) by the short lack of supply voltage.

JF002 commented 4 years ago

@DWagenk Thanks for this very accurate description. The video was very helpful, and I also reproduce this issue! The fact that it freezes is stange, but I cannot understand why the watchdog does not reboot the watch! I'll have to analyze that!

@lupyuen Maybe you hit the same but as @DWagenk, and that you created a small short when tapping the pogo in the SWD connector, which initiated a hardware reset?

JF002 commented 4 years ago

I think this bug is caused by a race condition between IRQ and device (re)init when the watch is woken up : the ISR is called before the SPI/TWI devices are correctly reconfigured. This cause an infinite while loop inside the Display task. It doesn't trigger the watchdog because SystemTask is still running correctly and refreshes the watchdog.

A quick fix consists in disabling the pushbutton IRQ for a bit of time (200ms) after it has been triggered. This way, it won't be possible to request to wake up while the system is still going to sleep. A better fix would require to improve the sleep/wakeup workflow so that this race condition becomes impossible.

Still, the watchdog is running, and a long push (7-10s) on the button prevent it from beeing refreshed and the MCU actually resets. I'm not sure why, but it looks like the bootloader is stuck somewhere, maybe also stuck in the device initialization? @lupyuen any idea on how to debug this?

JF002 commented 4 years ago

I analyzed this a bit further, and this is more complex than previously anticipated: race conditions occurs between SystemTask and DisplayApp : DisplayApp uses the SPI bus to draw on the display, SystemTask decides to put the devices to sleep. There are also async processing (touch IRQ and SPI DMA) that makes all of the more complex.

The bad news is that I managed to reproduce this in 0.7.1 too (just push on the button like crazy, the screen will eventually stay black).

This bug is more likely to happen in 0.8 RC because of the addition of Sleep/Wakeup method on the SPI and TWI, where the race condition put the devices into an incoherent state (the device is disabled when a transaction is running).

I'll try to find an "easy" fix to unblock this 0.8.0 release. Unfortunately, I don't have much time for now to work on PineTime :/

And we should not forget that I think we should have a look at the bootloader too : why cannot it run properly after a watchdog reset when the SPI has been put into an incoherent state ?

JF002 commented 4 years ago

I pushed a workaround for this issue : https://github.com/JF002/Pinetime/tree/sleep-race-condition-workaround It seems to prevent total freeze of the firmware, but sometimes, the displays shows garbage (a transaction is most certainly interrupted by the sleep mode).

I'll look for a better solution before releasing this workaround :)

JF002 commented 4 years ago

Ok, I think I've found a better solution! It's now in develop, I'll release version 0.8.1 RC for you to test ! EDIT : here is the release : https://github.com/JF002/Pinetime/releases/tag/0.8.1-develop

dwagenk commented 4 years ago

Thanks for all the work you've put into this!

Didn't have the problem appearing on 0.8.1 yet. I'll try a little more ("pushing the button like crazy") and report back if I encounter any problems.

yukdumboobumm commented 4 years ago

I can confirm this still occurs on 0.8.1. Some combination of button and touchscreen but nothing outside of typical user-behavior (I wasn't button mashing). Here's what I remember:

Firmware froze, gadgetbridge disconnected. Reset via shorting the pins. I've not been able to recreate it so something in the chronology is probably wrong or unimportant.

JF002 commented 4 years ago

@yukdumboobumm Thanks for your feedback. The default time and date are 1 january 1970, which is Unix time Epoch. If the time was reset to this value, it most probably mean that the firmware restarted (due to a crash or empty battery).

I've never noticed that the year would not be updated while the time was correctly sync'ed. If you can reproduce this behavior, could you please create a new issue?

I've tried many time to reproduce the crash you describe by swiping and pushing on the button, with no success. Can you reproduce it easily?

Note that in the meantime, @lupyuen fixed the bootloader. With this new version of the bootloader, even if the InfiniTime freezes or crashes, the bootloader should be able to correctly run and restart the watch.

JF002 commented 4 years ago

I couldn't reproduce any crash with this version. I close this bug. Do not hesitate to reopen it if necessary.