RealOrangeOne / django-tasks

A reference implementation and backport of background workers and tasks in Django
https://pypi.org/project/django-tasks/
BSD 3-Clause "New" or "Revised" License
311 stars 22 forks source link

Concurrency issues on SQLite #33

Closed RealOrangeOne closed 2 months ago

RealOrangeOne commented 3 months ago

SQLite doesn't support select_for_update, which is a key component of the database worker to ensure that a task is only picked up by a single database worker.

In Django 5.1, it's possible to change how the transactions are created. If transactions are created using EXCLUSIVE, this solves the concurrency issues, although increases the risk of locking errors (something we already work around with a retry, but it can reduce throughput). Therefore, without EXCLUSIVE, the task semantic changes from "at most once" to non-deterministic "zero or more times", which is at best unhelpful and at most a great source of confusion and complexity.

I see 2 possible solutions for this:

  1. Require that EXCLUSIVE transactions be used if SQLite is in a system check, and prevent usage without it
  2. Backport the functionality into django-tasks. This is far more complex, but it allows users of Django 4.2 and 5.0 to use SQLite.
mustafa0x commented 3 months ago

update ... returning can be used in place of select for update, if I understand the issue correctly.

RealOrangeOne commented 3 months ago

You'd still need a lock on the table to ensure 2 UPDATEs weren't running at the same time, which I think could still cause concurrency issues. I'm also not sure RETURNING is supported by Django, and I'd much rather avoid any custom SQL if we can avoid it.

mustafa0x commented 3 months ago

It replicates a lock. Eg,

UPDATE jobs
SET started_at = NOW()
WHERE started_at IS NULL
RETURNING *
LIMIT 1;

So if a row is returned, a lock has been acquired.