Encountering SQLITE_BUSY (or SQLITE_BUSY_SNAPSHOT) on multi process concurrent queries only when using transactions

solesignal commented 1 year ago

In a scenario where two parallel Node processes using better-sqlite3 perform the same UPDATE queries wrapped in a transaction (either using BetterSQLite API, or manually using BEGIN, COMMIT) concurrently, one UPDATE occasionally throws an exception SQLITE_BUSY (journal DELETE) or SQLITE_BUSY_SNAPSHOT (WAL) well before the 5s retry timeout has passed. This behaviour disappears if the enclosing transaction is removed.

My understanding from the documentation would be that even with an enclosing transaction, better-sqlite3 should perform a retry automatically within the timeout specified. If this understanding is incorrect, I would appreciate some insight (and possibly extension in the documentation) as to how the transaction scenario is different and how to handle such a situation gracefully in the better-sqlite3 using code.

Prinzhorn commented 1 year ago

I did a quick peek (not an expert on that topic):

better-sqlite3 passes the timeout to sqlite3_busy_timeout:

https://github.com/WiseLibs/better-sqlite3/blob/ae23e690b02c00d075d543c66ae7e26c98c46f74/src/objects/database.lzz#L177

but it never calls sqlite3_busy_handler which means the timeout takes no effect? Because:

If the busy callback is NULL, then SQLITE_BUSY is returned immediately upon encountering the lock. [...] The default busy callback is NULL.

But in addition the docs also say:

The presence of a busy handler does not guarantee that it will be invoked when there is lock contention. If SQLite determines that invoking the busy handler could result in a deadlock, it will go ahead and return SQLITE_BUSY to the application instead of invoking the busy handler.

So even with properly configured timeout you can run into a case where you immediately get SQLITE_BUSY. This could be the case for you, you need to provide a minimal repo for that. But in either case this would be a SQLite issue because better-sqlite3 itself does not actually implement any timeout or retry logic.

@JoshuaWise am I wrong with my conclusion or does the timeout option not actually do anything without calling sqlite3_busy_handler? But I'm pretty sure I must be wrong because there are tests for the timeout. That SQLITE_BUSY can be thrown regardless holds true either way, which answers this issue.

Prinzhorn commented 1 year ago

Also see https://github.com/WiseLibs/better-sqlite3/issues/155

solesignal commented 1 year ago

Ok - in short this then means that irrespective of any better-sqlite3 internal timeout handling, there can always be cases of such an exception on writes, which applications that can't afford to just error, need to handle by implementing their own rollback+retry.

I also conclude from your observation of internal timecode handling, that the higher incidence of such an error when using a wrapping transaction is either coincidental or sqlite3 internal behaviour.

Thank you very much for the clarification.

WiseLibs / better-sqlite3

Encountering SQLITE_BUSY (or SQLITE_BUSY_SNAPSHOT) on multi process concurrent queries only when using transactions #1067