espruino / BangleApps

Bangle.js App Loader (and Apps)
https://banglejs.com/apps
MIT License
498 stars 1.17k forks source link

Data corruption after battery empty #3398

Open pavelmachek opened 7 months ago

pavelmachek commented 7 months ago

Affected hardware version

Bangle 2

Your firmware version

2v21

The bug

I was using watch for gps logging, and battery ran flat for first time in maybe half a year. And... data corruption after I charged it. So it seems like the old issue is still there. Should I try to create some kind of reproducer?

Installed apps

sixths (0.03) was used for gps logging, it seems to trigger corruption quite easily.

bobrippling commented 7 months ago

I was using watch for gps logging, and battery ran flat for first time in maybe half a year. And... data corruption after I charged it. So it seems like the old issue is still there. Should I try to create some kind of reproducer?

Yes please - do you have a link to the old issue too?

pavelmachek commented 6 months ago

Old issue is: https://github.com/espruino/BangleApps/issues/3000

pavelmachek commented 6 months ago

If you want to try to reproduce it in the meantime, install sixths and do "topright"- "bottomright" -"bottomleft" gesture. It will turn on GPS. I can reproduce it way too easily on horseback trips.

I'll try to hack up better reproducer.

bobrippling commented 6 months ago

Great, thank you (also tagging #2865 for reference too). I'll tag @gfwilliams just so you're aware, in case there was any storage changes you're working on in the meanwhile

gfwilliams commented 6 months ago

sixths (0.03) was used for gps logging, it seems to trigger corruption quite easily.

Have you had corruption more than this one time in the last 6 months then?

At least now it should have popped up the recovery screen when you rebooted, and you have the choice to compact which may fix it, or you can connect via Bluetooth with the app loader at that point, do a backup, and then restore it and that'll definitely fix it too.

One issue is that storage is only scanned at boot, so we don't actually know whether the corruption did happen when the battery died, or whether it happened before and you only noticed when the watch rebooted.

Potential fixes for this might be:

pavelmachek commented 6 months ago

So I'm trying to reproduce the corruption, and no success so far. No, I did not detect corruption in last 6 months.

I guess "storage is only scanned at boot" might be my problem. I suspect I did heavier-than-average use in the past. Is there way to force filesystem check on runtime -- maybe we should be doing that periodically...

pavelmachek commented 6 months ago

(I guess I should try to simply reboot watch after "heavy" use, and see if I can find some kind of pattern? Is Settings|Utils|Turn off then turn it back on way to do that?)

gfwilliams commented 6 months ago

No, I did not detect corruption in last 6 months.

Ahh, ok - it's just you said "sixths triggers corruption quite easily" so I thought maybe it had done it a few times.

Is there way to force filesystem check on runtime

Actually right now there isn't, but I guess that could be exposed.

Using turn off is probably nicer, but you could long-press the button (although I guess that risks potentially rebooting while writing to flash)

pavelmachek commented 6 months ago

So, yes, I did have it few times, but it was more than 6 months ago :-).

So... what are the expected results of rebooting while writing to flash? I'm used to ext4 where there should be no ill results; recent writes may be lost but things should be consistent. How does BangleJS compare?

pavelmachek commented 6 months ago

Ok, so I drained battery maybe 10 times this morning, then did some tweaks from App loader, and now App loader no longer connects. But watch still reboots ok.

pavelmachek commented 6 months ago

For the record, may battery draining script was this:

function eat(on) { Bangle.setOptions({backlightTimeout: 0}) Bangle.setGPSPower(on, "sixths"); Bangle.setCompassPower(on, "sixths"); Bangle.setBarometerPower(on, "sixths"); Bangle.setHRMPower(on, "sixths"); Bangle.setBacklight(on); if (on) { for (let i=0; i<10000; i++) ; } }

function logstamp(s) { logfile.write("utime=" + getTime() + " " + s + "\n"); }

drain = 0; cooldown = 0;

function periodic() { bat = E.getBattery();

g.setColor(1, 1, 1); g.fillRect(0, 24, 240, 240); g.setFontAlign(-1, 1); g.setColor(0, 0, 0); g.setFont('Vector', 26); let now = new Date(); msg = now.getHours() + ":" + now.getMinutes() + " " + bat + "% "; msg2 = cooldown + "/" + drain; g.drawString(msg, 10, 90); g.drawString(msg2, 10, 130);

print(msg + msg2); logstamp(msg + msg2) if (bat < 2) { if (drain) drain--; else { if (cooldown) { eat(0); cooldown--; } else { cooldown = 760; drain = 260; require("Storage").compact(true); eat(1); } } } else { eat(1); drain = 2*60; } }

let logfile = require("Storage").open("crash.log", "a"); setInterval(periodic, 1000);

gfwilliams commented 6 months ago

and now App loader no longer connects. But watch still reboots ok.

It might just be your PC's bluetooth connection is confused. I doubt it's a bangle issue.

If power fails while writing to flash I think it's similar, but the issue is really if it's writing a new file's header and those 32 bytes end up partially written. It'd be extremely unlikely for that to happen though

pavelmachek commented 6 months ago

I have solid connection from the Web IDE, yet app loader has deterministic problems. Compaction still works:

require("Storage").compact(true); =undefined

require("Storage").getStats(true) ={ totalBytes: 8388608, freeBytes: 2427696, fileBytes: 5960912, fileCount: 346, trashBytes: 0, trashCount: 0 }

But I seem to read all the files okay,I'll try to debug it a bit more.

pavelmachek commented 5 months ago

Ok, so now I have corruption, this time .boot0 contains 0xff's . I don't recall running battery flat, recently. (Is there easy way to fix it?)

00000ce0: 6c 3b 61 2e 6d 6f 76 65 6d 65 6e 74 2b 3d 65 2e l;a.movement+=e. 00000cf0: 63 68 61 72 43 6f 64 65 41 74 28 33 29 3b 61 2e charCodeAt(3);a. 00000d00: 6d 6f 76 43 6e 74 2b 2b 3b 0a 6c 26 26 61 2e 62 movCnt++;.l&&a.b 00000d10: 70 6d 43 6e 74 2b 2b 7d 68 2d 3d 34 7d 61 2e 62 pmCnt++}h-=4}a.b 00000d20: 70 6d 43 6e ff ff ff ff ff ff ff ff ff ff ff ff pmCn............ 00000d30: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000d40: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000d50: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000d60: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ 00000d70

gfwilliams commented 5 months ago

That's an odd one - if you look at the list of files in the Web IDE (click the storage icon towards the middle of the screen) are there any filenames that look corrupt?

If you just go to the Settings app then Utils and rewrite settings, and that should fix it?

It seems to me like somehow the bootupdate.js file got interrupted in the middle of writing .boot0 after the settings were changed. I've never seen that before, and I'm not quite sure I see how that could have happened unless it took a very long time to run, during which you were maybe holding the button?

pavelmachek commented 5 months ago

Filenames look ok in the web ide. I had few files truncated to 0xffff bytes in the past. Let me try the "rewrite settings".

Is there documentation of filesystem properties somewhere? Knowing which actions can lead to corruption would be good. I now know "battery running empty" is dangerous. Is the "2 second press" dangerous? Is the "5 second press" dangerous?

pavelmachek commented 5 months ago

(And yes, "rewrite settings" did the trick for me. Thanks!)

gfwilliams commented 5 months ago

I now know "battery running empty" is dangerous.

In recent firmwares even that isn't bad - it just means that maybe if you're super unlucky a file header will be partially written, but you do have to be very unlucky for that.

Is the "2 second press" dangerous? Is the "5 second press" dangerous?

They shouldn't actually be very dangerous now either. I don't think it's very easy for a write to end up getting completely interrupted (apart from with the 5 second press) but even then it's just that a file ends up partially written like in this case.

All I can think happened for you is maybe you changed a setting and then the storage needed compacting when .boot0 was updated (it should display a message onscreen now), then maybe you held down the button to try and break out, and that caused Espruino to break out of execution. It seems very unlucky though.

Do you remember doing anything in particular right before you started noticing the error?