espruino / BangleApps

Bangle.js App Loader (and Apps)
https://banglejs.com/apps
MIT License
478 stars 1.14k forks source link

[Banglejs 2] Filesystem corrupted #2865

Closed pavelmachek closed 1 year ago

pavelmachek commented 1 year ago

Affected hardware version

Bangle 2

Your firmware version

2v18

The bug

It seems like something is wrong with the filesystem. My own application that writes log hangs on startup. "Run" using "recorder" hangs when I tell it to create new track. I can connect with ide, can get file listing, and can even retrieve my log file.

When I attemp to start my app from IDE, I get lot of:

FW addr 0x00448800 fail Status 0 FW addr 0x00448900 fail Status 0 FW addr 0x00448a00 fail Status 0

Any ideas how to debug this? Should I try to preserve the filesystem somehow?

Installed apps

widlock (0.08), mylocation (0.09), daisy (0.10), magnav (0.06), devstopwatch (0.04), scicalc (0.01), gpsnav (0.08), flow (0.01), f9lander (0.03), orloj (0.01), ac_ac (1.00), astrocalc (0.04), sixths (0.01), widbat (0.11), widclk (0.07), waypoints (0.01), widadjust (0.02), dtlaunch (0.23), altimeter (0.03), weather (0.25), sched (0.22), messageicons (0.05), waypointer (0.09), waypoint_editor (0.01), alarm (0.41), health (0.23), openstmap (0.17), widmessages (0.05), bordle (0.03), notify (0.13), run (0.16), infoclk (0.09), pebbled (0.04), clock_info (0.05), boot (0.58), setting (0.61), messagegui (0.71), messages (0.60), drained (0.03), recorder (0.27)

pavelmachek commented 1 year ago

I can reproduce the hang by opening file from cmdline.


>let logfile = require("Storage").open("sixths.egt", "a");
=StorageFile: {
  name: "sixths.egt",
  chunk: 4, offset: 0, addr: 0, len: 4802139, mode: 97 }
>logfile.write("\n");
``

How big is the flash? Can the length be correct? Should I just delete the file?
pavelmachek commented 1 year ago

When I receive the file using ide, I get:

-rw-r--r-- 1 pavel pavel 122784 Jul 5 21:31 /tmp/sixths.egt

I deleted it using ide, but the hang still remains.

pavelmachek commented 1 year ago

I'm getting

weather.wid.js SyntaxError: SyntaxError: Got '*' expected ':'

when trying to launch code from IDE, then the "FW addr" flood described above. It ends with:

FW addr 0x007d9fc8 fail Status 0 FW addr 0x007da000 fail Status 0 FW addr 0x006c1fc0 fail Status 0 Execution Interrupted at line 6006 col 2 in " 0 # 2"0,444.  ("iDH@AC ^ in function "reload" called from line 6015 col 579 in " 0 # 2"0,444.  ...A@"AA"A@dA@ A!AA!@A`AA 0A ^ at line 1 col 36 try{eval(require("Storage").read(a))}catch(c){print(a,c)} ^ in function called from line 1 col 147 ...d(a))}catch(c){print(a,c)}});var b=WIDGETS;WIDGETS={};Object... ^ in function called from line 1 col 20 Bangle.loadWidgets(); ^ New interpreter error: FIFO_FULL SyntaxError: SyntaxError: Got ?[16] expected EOF Looks like some other app used setTime, so reset clockError. (elapsed = 130137) SyntaxError: SyntaxError: Got ?[16] expected EOF 2023-07-05T19:37:12.266Z QUIT & SAVE STATE

0.14 - 34.59 = -34.45 (-42.7 PPM) ; 47.1 - 47.1 = 0.0 FW addr 0x006c2400 fail Status 0 Uncaught Error: File already written with different data at line 6001 col 308 in " 0 # 2"0,444.  @aCFhh0 "&`#(!(aDap!#d ,p(.hb.d- (a,pb.,0)3I`,... ^ in function called from system at line 1 col 3176 in widadjust.wid.js ...Math.round(ppm*1000)/1000,});} ^ in function called from system FW addr 0x006c2400 fail Status 0 Uncaught Error: File already written with different data at line 5 col 369 in .bootcde ....writeJSON("setting.json",s);}} ^ Uncaught ReferenceError: "ixths" is not defined at line 1 col 6 ixths.egt", "a"); ^ Uncaught ReferenceError: "start" is not defined at line 1 col 1 start(); ^

pavelmachek commented 1 year ago

I forced reboot.

Now I got "Checking system" "Storage is corrupted" ?? something about formatting the area

It seems my watch now factory reset :-(.

gfwilliams commented 1 year ago

Hi, sorry - yes, it looks like Storage got corrupted somehow. Unfortunately as you say, when you forced the reboot it realised something was wrong and reset itself to factory default to try and keep working.

If you update to a cutting edge firmware, it now doesn't do that and instead shows a 'recovery menu' asking you what to do - it's a shame I hadn't done a proper release with that in time for when this happened to you or we could have debugged it.

The FW addr errors mean it's tried to write to flash memory, but the value in the memory after writing isn't what it expected.

What does your "custom logging app" look like?

These errors seem very rare now, but I've had a hunch for a while it might be related to StorageFile writes.

This does at least provide some clue:

=StorageFile: {
  name: "sixths.egt",
  chunk: 4, offset: 0, addr: 0, len: 4802139, mode: 97 }

Did you get a change to try this repeatedly? Was the len field always that long?

There is 8MB of flash so theoretically it could be ok, but when new StorageFile blocks are created they are 40928 bytes, so something is wrong here.

Looking at the code it's possible there is an issue with how open(..., "a") works when the file is a certain length - I'll take a look now

gfwilliams commented 1 year ago

I couldn't reproduce anything apart from the longer file length (which I could do by writing a file of exactly the chunk size and then trying to append to it), but that itself was a little concerning.

It felt like with the longer file length it'd have been possible to then do some writes which pushed it past that, but because addr=0 the write would have allocated a new file and then set the length up properly.

Anyway, I just pushed some changes to totally remove len and addr fields and instead look them up each time, because they were kind of asking for trouble. Hopefully that will finally fix this issue.

On Espruino, it's commit 4e5ec62d20cd061c418a23b804cbf7fcc99e98a6

pavelmachek commented 1 year ago

Can you try logging into Storage file while running out of battery? That may be what happened here. I'm experimenting with low-power GPS logger...

gfwilliams commented 1 year ago

Ahh, you didn't mention that in the issue!

Running out of battery while writing shouldn't really be an issue unless you're super unlucky - as long as the first 4 bytes of the header manage to get written ok, storage won't be corrupt - failed writes in the middle of a file will be ok.

About the only potential issue is if you were low on storage and battery, and then the storage automatically compacted (which could take a minute of high power usage) and during that time the battery died. That would cause you a bunch of trouble.

But it's hard to know what to do in that case... Do we just refuse to compact (and so refuse to write any data to files) if the battery is below 10%?

pavelmachek commented 1 year ago

Sorry about not mentioning it earlier. I guess documentation should warn about this, as I assumed power downs are not expected to cause problems.

So... "best" solution would be to do journalling filesystem, so that powerdown at arbitrary time can't result it problems. (Like ext3 on Linux). I guess that's too complex to be practical.

So as you mention... checking battery before writes would make sense. If it is low (3%?) not starting the write and going onto special mode a-la "Drained" would make sense.

gfwilliams commented 1 year ago

Well, we do have a journalling filesystem 99% of the time, which is why normally a power cut during writes would cause no problem.

The only issue is when we have to 'compact' - removing the deleted journal entries.

Potentially there could be some kind of 'rolling' compaction implemented that only does a few pages at a time, but I think that's a step too far here

bobrippling commented 1 year ago

I'm seeing FW addr <addr> fail too, is the best approach to allow the watch to reload, and use the recovery menu? It's not prompted me yet, so I assume the error's still transient - I could aid in debugging the issue if you'd find that useful?

gfwilliams commented 1 year ago

Thanks! Is there anything in particular you think caused it, and what firmware are you on?

Please could you turn on the Web IDE's logging, then do:

function toIntelHex(saveAddress, saveSize) {
  var f = require("Flash");
  // Now work out intel hex
  function h(d) { var n = "0123456789ABCDEF"; return n[(d>>4)&15]+n[d&15]; }
  function ihexline(bytes) {
    var sum = 0;
    bytes.forEach(function(a) {sum+=a;});
    bytes.push(1+((~sum)&255)); // checksum
    return ":"+bytes.map(h).join("");
  }
  var lastHighAddr = -1;
  for (var idx=0;idx<saveSize;idx+=16) {
    var addr = saveAddress+idx;
    var highAddr = addr>>16;
    if (highAddr != lastHighAddr) {
      lastHighAddr = highAddr;
      Bluetooth.println(ihexline([2,0,0,4,(highAddr>>8)&255,highAddr&255]));
    }
    var bytes = [
      16/*bytes*/,
      (addr>>8)&0xFF, addr&0xFF,
      0]; // record type
    bytes=bytes.concat(f.read(16,saveAddress+idx).slice());
    Bluetooth.println(ihexline(bytes));
  }
  Bluetooth.println(":00000001FF");
}
toIntelHex(0x60000000,0x800000)

And save it to a file? It'll contain all of storage so if there's any personal info it might be better to email it to me.

It'll take a long time though as it's dumping all 8MB of flash, but hopefully I can then effectively 'clone' your watch here and try to see what's gone wrong.

bobrippling commented 1 year ago

Sure - this is currently going off the IDE scrollback, is there a way I could run it from a terminal or otherwise dump the bluetooth console to a more permanent storage?

I attempted to use noble-mac but node aborts when I attempt to run the mentioned program

gfwilliams commented 1 year ago

Ahh, sorry - yes. Scrollback won't be big enough.

Go to Setting -> General -> Terminal Log Icon then use the icon in the top left to start logging - that's what I meant above :)

bobrippling commented 1 year ago

Ah no that's my fault, I didn't read your message about the IDE's logging fully. Thanks for the detail - I'm afraid I left my watch running with this and the battery died part way through, so I've only a partial dump (storage was wiped on reboot - not sure if a resumed dump would be any good to you, perhaps the old storage was only partially cleared?)

gfwilliams commented 1 year ago

Argh, what a pain! Was it 2v18 or something that didn't have the recovery menu?

A partial dump may help - it's possible the issue was early on in the dump... but one from after the storage wipe won't help

bobrippling commented 1 year ago

It's 2v18 but oddly no recovery menu - is there a way to trigger it in that state to double-check? I notice when I do reset my watch that it says DFU2v15 (I think, I only see it very briefly), but my watch's firmware reports 2v18:

>process.env.VERSION
="2v18"
gfwilliams commented 1 year ago

Yes, having the older DFU is no problem. But I don't think 2v18 has recovery - it's the cutting edge builds that do (and interestingly they have some changes that may make the corruption less likely to happen too)

bobrippling commented 1 year ago

Ah gotya - I've emailed the (partial) flash dump across, please let me know if there's anything else I can help with

gfwilliams commented 1 year ago

Great - thank you!