Open kalafut opened 7 months ago
Hmm, that's not good. Debugging failing integrity checks can be complicated and there's not built in debugging tooling in Litestream. Do you know if you're using any extensions with your SQLite database? It shouldn't matter but I figured I'd ask to double check.
No extensions are used.
What is interesting is that this occurs with an hours-old snapshot and just a few WALs:
➜ /tmp litestream restore -v -o scorekeeper.db s3://xxxxxxxxxxxxxxxx/sync_server
2024/04/22 13:54:48.459675 s3: restoring snapshot 0734a376671bf473/00000003 to scorekeeper.db.tmp
2024/04/22 13:54:48.550590 s3: restoring wal files: generation=0734a376671bf473 index=[00000003,00000004]
2024/04/22 13:54:48.655964 s3: downloaded wal 0734a376671bf473/00000004 elapsed=104.373292ms
2024/04/22 13:54:50.779728 s3: downloaded wal 0734a376671bf473/00000003 elapsed=2.228415666s
2024/04/22 13:54:50.792547 s3: applied wal 0734a376671bf473/00000003 elapsed=12.437458ms
2024/04/22 13:54:50.793711 s3: applied wal 0734a376671bf473/00000004 elapsed=1.15025ms
2024/04/22 13:54:50.793721 s3: renaming database from temporary location
I downloaded the snapshot from S3 and decompressed it (🎉 that stock lz4
worked). Integrity checks passed fine. I don't know how do to anything with the WAL files though. I'm happy to ship any of these files to a team member for dissection.
I'm using version v0.3.13 in a mostly vanilla fashion: single server replicating to S3. The only modifications have been to the retention setup:
Things had been working fine, including test restores. Even now replication is proceeding as normal. But about a week ago, I did a test restore and saw some database integrity errors. When I ran
pragma integrity_check
I saw stuff like:I restarted the Litestream service and just let things run, thinking maybe it was a glitchy WAL and would be flushed out (still not great, but...). A week later, a freshly restored file still exhibits the same issues. When I ssh into the server and run
pragma integrity_check
on the actual database, it always returnsOK
.I'm sure if I just wiped/vacuumed/restarted everything, thing would probably again run fine, but that's not answering any questions. Are there any diagnostics you'd like me to run on this bad state?