garethgeorge / backrest

Backrest is a web UI and orchestrator for restic backup.
GNU General Public License v3.0
1.72k stars 50 forks source link

1.6.x doesn't start on armv6 anymore #541

Closed arminus closed 1 day ago

arminus commented 3 weeks ago

Describe the bug Starting with 1.6.0, backrest doesn't work on a Pi-Zero anymore, 1.5.1 works fine

2024-11-07T20:55:41.834+0100 INFO writing logs to: /root/.local/share/backrest/processlogs unexpected fault address 0xb6c75fff fatal error: fault [signal SIGBUS: bus error code=0x2 addr=0xb6c75fff pc=0x6822c0]

To Reproduce See the attached log out.log

Platform Info Raspbian GNU/Linux 9.13 (stretch) Linux raspberrypi 4.19.66+ #1253 Thu Aug 15 11:37:30 BST 2019 armv6l GNU/Linux Backrest 1.6.1 (https://github.com/garethgeorge/backrest/releases/download/v1.6.1/backrest_Linux_armv6.tar.gz)

garethgeorge commented 2 weeks ago

Stack trace is very helpful, looks like this is related to the new dependency on sqlite introduced in 1.6.0. Thanks for reporting this.

I'm wondering -- does this happen first run on a fresh install? Or is it only on subsequent runs? It looks like the failure is coming from some bug in the WAL implementation that's hitting specifically on arm.

It could be related to running a few unnecessary pragmas in the init block each time I open the database / trying to change the page size here: https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L67-L68 . It could also be a fundamental issue with WAL.

If you have an ability to build the binary yourself, you might try deleting those PRAGMA lines from the init block or I can provide you with a custom build to try out. I find it very suspicious that it's a WAL related crash that's triggering WHEN the database first tries to execute a query that changes mode and page size settings.

arminus commented 2 weeks ago

Trying ...

pi@raspberrypi:~/backrest/cmd/backrest $ go build . warning: GOPATH set to GOROOT (/home/pi/go) has no effect ../../webui/webuinix.go:15:12: pattern dist: no matching files found

Does the backend have a dependency on the webui? I'm pressed for space on that Pi-Zero, not sure if I manage to build the UI as well with all the npm dependencies...

garethgeorge commented 2 weeks ago

Ah, it does but can be “faked” by simply creating that directory it’s looking for and placing an empty file in it. Sorry you’re running into that!

arminus commented 2 weeks ago

Ok, managed to compile it with your change, still crashes:

out2.log

garethgeorge commented 2 weeks ago

Thanks for giving this a go -- confirmed with another raspberry pi user in https://github.com/garethgeorge/backrest/issues/536 that this does work on their raspberry pi device which means the crash is inconsistent. It's reproducible for you though and raspberry pi zero is a device that I'd expect to work. Unfortunately fixing this will be tricky as I don't have test hardware.

I'm using https://pkg.go.dev/modernc.org/sqlite which seems to be where the crash comes from. The package is pure go and is intended to be very easy to cross compile. It's also supposedly getting mature -- but it feels like we may be hitting an edge case on a less common architecture here.

The alternative is to look at switching to https://github.com/mattn/go-sqlite3 which depends on "C" sqlite , and introduce a zig toolchain to build it a bit like what this article describes: https://zig.news/kristoff/building-sqlite-with-cgo-for-every-os-4cic

Unfortunately not an easy fix, but perhaps worthwhile if I can't trust the stability of modernc sqlite.

In terms of remaining debugging steps, if you have the bandwidth I'd be very appreciative if you could try:

Note, if you only have bandwidth to try one more thing, I'd say try deleting the OpenWAL flag on the linked line above. I'm realizing that this is probably still causing the problem I was worried about with the pragma's I had you delete. If that works, I can publish a new release that disables WAL for 32 bit devices.

arminus commented 2 weeks ago

Thanks for giving this a go -- confirmed with another raspberry pi user in https://github.com/garethgeorge/backrest/issues/536 that this does work on their raspberry pi device which means the crash is inconsistent.

I have Pi4, it works without a problem there as well.

Also tested it on another Zero with a newer OS: Raspbian GNU/Linux 11 (bullseye), Linux raspberrypi 5.10.103-v7+ #1529 SMP Tue Mar 8 12:21:37 GMT 2022 armv7l GNU/Linux - same problem as on the first Zero. So it's not limited to just one OS/kernel on the Zero.

I changed L38 like so (for now on the same Pi Zero)

                Flags:    sqlite.OpenReadWrite | sqlite.OpenCreate,

Result:

2024-11-12T17:39:20.431+0100    INFO    writing logs to: /home/pi/.local/share/backrest/processlogs
2024-11-12T17:39:20.479+0100    WARN    operation log may be corrupted, if errors recur delete the file "/home/pi/.local/share/backrest/oplog.sqlite" and restart. Your backups stored in your repos are safe.
2024-11-12T17:39:20.483+0100    FATAL   error creating oplog: init sqlite: sqlite: step: SQL logic error: cannot change into wal mode from within a transaction

(of course I deleted the file as suggested and restarted, same issue)

but there's another

PRAGMA journal_mode=WAL;

here: https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L67 - maybe that also needs to be set to something other than WAL?

garethgeorge commented 2 weeks ago

Thanks for digging into this and trying on another device, appreciate the time you're spending looking at it.

It's very encouraging that it's getting further with WAL mode disabled on open. I think you should remove both of https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L38 and also the pragma you linked on https://github.com/garethgeorge/backrest/blob/main/internal/oplog/sqlitestore/sqlitestore.go#L67. I'm fairly optimistic that'll get backrest to a place where it's starting up successfully.

In general, any reference to WAL in the codebase will probably need to be disabled. You may run into a similar problem with the log store so you'll want to disable WAL here https://github.com/garethgeorge/backrest/blob/main/internal/logstore/logstore.go#L46 and here https://github.com/garethgeorge/backrest/blob/main/internal/logstore/logstore.go#L80 (anywhere it's referenced in the code).

If WAL is the root cause I'll likely go ahead and open an issue on https://pkg.go.dev/modernc.org/sqlite as well. If the fixes work and you can link me your fork, I'll also make sure I include a patch for this going forward with the next release.

arminus commented 2 weeks ago

I now removed both pragmans in L67-68 as well as the | sqlite.OpenWAL in L38, no improvement: out3.log

garethgeorge commented 2 weeks ago

Hey, would you mind uploading your fork and possibly linking your fork so I can inspect the changes and possibly PR any additional fixes that are needed to get this working in your fork? It looks like the stack trace is still coming from executing the script that contains the pragma PRAGMA journal_mode=WAL; block which is confusing / it looks like the DB is still trying to use WAL mode.

I've gone ahead and opened an issue with your stack trace on the underlying database driver: https://gitlab.com/cznic/sqlite/-/issues/197

arminus commented 2 weeks ago

Here you go: https://github.com/garethgeorge/backrest/compare/main...arminus:backrest:main

garethgeorge commented 2 weeks ago

Thanks for uploading that -- just PR'd another fix https://github.com/arminus/backrest/compare/main...garethgeorge:backrest:patch-1 based on some feedback I got from the sqlite driver developer in https://gitlab.com/cznic/sqlite/-/issues/197 .

If you can give yet another build with that dependency change a go, I'm fairly optimistic that a libc version mismatch would explain this.

arminus commented 2 weeks ago

The downgrade fixes the problem, see https://github.com/garethgeorge/backrest/pull/559 for the full PR (WAL flag and pragma removal rollback)

garethgeorge commented 1 day ago

Fixed in the latest release, thanks again trying those patches / helping track this down. Glad we were able to get it fixed.