Closed xrd closed 9 months ago
I rebooted the litestream process and now see a bunch of files inside the WAL directory. I assume this means litestream was not running correctly (though to be honest this is confusing because I didn't reconfigure it before rebooting). This is probably why I got the message "no matching backups" because there were not WAL files to recover from at that date.
I've been reviewing the code and think that either my expectations are incorrect about files on s3 created by litestream, or there is a bug in the way that restore works with timestamps.
I expected that if I run litestream, that it is storing checkpoints of the database (https://litestream.io/how-it-works/). If I understand this correctly, I should see a bunch of checkpoints/generations inside s3. I don't see these on s3, so perhaps I have misconfigured litestream. To be clear, I do see litestream publishing a db and WAL into a generations / UUID directory, but only a single file in both.
A note on the system: we had a bug in the production system, so during an event I rebooted the litestream process (and subprocess for the sqlite tool: pocketbase). It used the same database file, but I wonder if this would cause litestream to lose track of the checkpoints/generations and this might be why my timestamp restoration process fails.
At a minimum, the error message (
no matching backups found
) seems off, because there are backups, just perhaps not at this timestamp (restore works with the same config file but no-timestamp
switch).I have this structure on my production system.
Does this indicate I have only two checkpoints? And, since there are tens of thousands of records spanning an event that was three or four hours, this indicates there was an issue with litestream or my configuration of litestream? Specifically, should there be many more WAL files?
I also see this in the database itself, there are WAL files there:
Discussed in https://github.com/benbjohnson/litestream/discussions/563