database is locked on long running job

setop commented 3 years ago

continuation of #58

I have a daily job, written python (3.8). By design, it starts after litestream, which is always running. The job perform some insert and updates, generally 1000, at a pace of one to two per second. Some day, everything is fine, but once every couple of days, the script outputs errors : sqlite3.OperationalError: database is locked. Last time, it appends mid time (after 450 inserts) and all the way to the end : meaning not a single update or insert has been successful after that very first error.

Is there a way to prevent that ?

benbjohnson commented 3 years ago

@setop What version of Litestream are you using? Litesteam used to obtain a write lock frequently but it's been reduced significantly by https://github.com/benbjohnson/litestream/pull/170 in v0.3.4.

Litestream still requires a write lock during WAL checkpointing so you'll need to set a busy_timeout to avoid the database is locked error: https://litestream.io/tips/#busy-timeout

benbjohnson commented 3 years ago

@setop Did you have any luck with setting the busy_timeout in your application?

setop commented 3 years ago

@benbjohnson, yes it is setup now. I'll let it run a couple of days and report the result here. Thanks for the tip and thanks for the support.

setop commented 3 years ago

@benbjohnson, unfortunately, it happened again today after some days of good run.

Also, litestream was preventing truncation of journal. The journal was growing quite fast. With litestream disable I've been able to truncate it.

benbjohnson commented 3 years ago

@setop Hmm, that's odd. I have a server process that continually writes about 10 tx/sec all the time so 1 or 2 tx/sec shouldn't be a problem.

Is your application issuing a PRAGMA wal_checkpoint() command? Litestream maintains an open read transaction all the time to prevent checkpointing so that command will fail for PASSIVE checkpoints or hang for TRUNCATE or RESTART checkpoints.

setop commented 3 years ago

No PRAGMA wal_checkpoint() in my application. Only basic select|insert|update.

benbjohnson commented 3 years ago

@setop Can you provide some details about what Litestream version you're running, what operating system, and what CPU architecture? If you have a small script that can reproduce it that'd be great but I can try to repro on my side too.

setop commented 2 years ago

Hi, Sorry for the delay. Due to this issue I removed Litestream from my system, replaced it with an old fashion backup, then time passed. Since this project is very active and the it would be very valuable to me, I recently made a new attempt with version v0.3.8. I also ensure that the pragmas are properly set in my application :

PRAGMA journal_mode = WAL
PRAGMA synchronous = NORMAL
PRAGMA busy_timeout = 5000
PRAGMA wal_autocheckpoint = 0

Unfortunately, nothing changed. I still have sqlite3.OperationalError: database is lockedat some point and the process crash.

ribtoks commented 1 year ago

Same issue here. Using latest v0.3.9 as a systemd service and also having a simple Python script that from time to time does an insert. These inserts fail with "database is locked" error - like all of them.

The list of pragmas I use:

PRAGMA auto_vacuum = 0;
PRAGMA cache_size = -20000;
PRAGMA case_sensitive_like = true;
PRAGMA encoding = "UTF-8";
PRAGMA journal_mode = WAL;
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA synchronous = NORMAL;
PRAGMA busy_timeout = 5000;

fractaledmind commented 6 months ago

Same issue here on v0.3.9. I am investigating to try and learn more, but because the database is locked rarely and randomly, investigation is difficult.

I am running Litestream on an Ubuntu 22.04 (LTS) server running amd64 (as reported by dpkg --print-architecture). Litestream is running as a systemd service, and was installed via this script:

#!/usr/bin/env bash
set -e

# Load environment
source /home/deploy/.bashrc

# Determine architecture of current env
arch=$(dpkg --print-architecture)

# Download the latest .deb file
wget "https://github.com/benbjohnson/litestream/releases/download/v0.3.9/litestream-v0.3.9-linux-$arch.deb"

# Install that .deb file using dpkg
sudo dpkg -i "litestream-v0.3.9-linux-$arch.deb"

# Verify it is installed
echo "Litestream version:"
litestream version

# Enable Litestream to run continuously as a background service
sudo systemctl enable litestream

# Start Litestream running continuously as a background service
sudo systemctl start litestream

# Verify the service is running
echo "Litestream service logs:"
sudo journalctl -u litestream

The database is used by a Rails application. When the database gets stuck in a locked state, application connects throw a SQLite3::BusyException: database is locked exception. I have checked every single occurrence of the error, but the most recent 5 all trace back a write within a transaction. Rails uses the bare TRANSACTION keyword, so transactions are are all DEFERRED mode, which is why at least in these cases the error is thrown immediately and not after the busy_timeout is exhausted. But, when problems are reported and I investigate, the problem does seem to be that the database has a connection that is stuck holding the write lock and no other connection can write, and the database has been stuck for over an hour before. I resolve by redeploying the app, which simply turns everything off and on again.

I doubt anything in there magically reveals the source of the issue, but I thought it might be helpful to give you as much information as I have at least.

loxs commented 1 month ago

This seems to be a problem with Python's sqlite driver: https://forum.djangoproject.com/t/sqlite-and-database-is-locked-error/26994

I have many databases that I backup with litestream and only the app written in python has this problem. Tried changing many options as in the comments above, nothing short of stopping litestream helps my python app to be able to finish.

hifi commented 1 month ago

If you use transactions with your application always use BEGIN IMMEDIATE if Litestream is running and it should prevent the locked issue given you have busy timeout set.

benbjohnson / litestream

database is locked on long running job #218