Closed breckcs closed 2 months ago
Thanks for the detailed report! I've pushed a fix in #83. The reason this appears to work in the Rust SQLite driver is because they specify a busy timeout by default. SQLite's default behavior is to return a SQLITE_BUSY
(i.e. database is locked
) when a locked database is encountered - but using this setting the system can instead opt to wait and periodically retry for a limited amount of time.
In the linked PR I have made the busy timeout configurable, and set a default busy timeout of 5 seconds similar to the rusqlite driver as that seems like a reasonable default.
@Mytherin, thank you for looking into this.
I updated my test project to include your fixes from #83, but, unfortunately, it doesn't fix the problem.
Tests 3, 4, and 5 (described above) still fail the same way, including the access to undefined memory
crash in test 4.
Interestingly, test 6 now passes (using the DuckDB client to both read and write with WAL enabled). Perhaps this also suggest some binary incompatibility issues between duckdb-rs
and rusqlite
?
Apologies - you are correct. I had not run the original test suite with the new extension but instead ran with DuckDB running in a separate process from rusqlite
, and that works as expected. The problem occurs specifically when running rusqlite
and DuckDB
with the sqlite extension in the same process.
I think your suspicion is correct and that there is something going wrong with conflicting versions of SQLite within the same process. When disabling the bundled
setting of rusqlite
I can no longer reproduce the issue and it seems to work correctly:
$ > cargo test sqlite_writer_duckdb_reader_with_wal -- --nocapture
DuckDB COUNT : 9999
DuckDB COUNT : 10000
test tests::test_sqlite_writer_duckdb_reader_with_wal ... ok
test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 5 filtered out; finished in 13.69s
diff --git a/Cargo.lock b/Cargo.lock
index b257ebb..5117edd 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -718,7 +718,6 @@ version = "0.28.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0c10584274047cb335c23d3e61bcef8e323adae7c5c8c760540f73610177fc3f"
dependencies = [
- "cc",
"pkg-config",
"vcpkg",
]
diff --git a/Cargo.toml b/Cargo.toml
index 7ee27d9..9ca16c0 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -4,5 +4,5 @@ version = "0.1.0"
edition = "2021"
[dependencies]
-rusqlite = { version = "0.31.0", features = ["bundled", "limits"] }
+rusqlite = { version = "0.31.0", features = ["limits"] }
duckdb = { version = "0.10.1", features = ["bundled"]}
\ No newline at end of file
As to why that works - I could not figure out with complete certainty, but I do have a theory. The problem could be that both rusqlite
and DuckDB's sqlite
extension bundle their own version of SQLite. As a result, when running your test project, there are actually two versions of SQLite running concurrently within the same process - the version bundled by rusqlite
, and the version bundled by the DuckDB SQLite extension. Due to the way that shared libraries work (at least on MacOS), these are isolated, and have their own version of all global/static variables.
SQLite's concurrency model works based on locks. In particular - it is not actually possible to safely/correctly read from a SQLite database file while it is being written to. SQLite resolves this using locks at two levels: file system level locks (using fcntl
) and mutexes within the same process.
For concurrent access between processses file system level locks are used. File system level locks on Unix (MacOS/Linux) are handled at the process level rather than the file handle level. That means both running SQLite instances (rusqlite
and DuckDB) successfully obtain a lock to the database file at the same time, despite working with different files.
For concurrent access in the same process mutexes are used. These mutexes are coordinated in the same process using global variables. However, because there are two versions of SQLite that have their own version of these global variables, they do not see each-other's mutexes. As such both runners successfully obtain the mutex lock at the same time as well.
Now as to why this throws a bus error, I'm still a bit puzzled. Particularly because the bus error seems to happen in the writing thread (as you mentioned the bad access happens in walIndexAppend
from rusqlite
). I do have a suspicion there as well - DuckDB opens a separate transaction for each read. So what is actually happening for each read iteration within SQLite is:
BEGIN TRANSACTION;
SELECT COUNT(*) FROM users;
COMMIT;
Maybe beginning/commiting a transaction in WAL mode has some sort of effect on the WAL that causes the problem to happen there.
I've also succeeded in reproducing this in Python using the following script:
import duckdb
import sqlite3
import threading
import time
import os
sqlite_db = 'sqlite_writer_duckdb_reader_no_wal.db'
insert_count = 10000
def sqlite_write_thread():
con = sqlite3.connect(sqlite_db)
con.execute("PRAGMA journal_mode='WAL'")
con.execute('CREATE TABLE IF NOT EXISTS users (name VARCHAR, age INTEGER);')
for i in range(insert_count):
time.sleep(0.001)
con.execute("INSERT INTO users VALUES ('Alice', (?))", (i,))
con.commit()
res = con.execute('SELECT COUNT(*) FROM users').fetchall()
if res[0][0] != insert_count:
print(f"Failed to insert all rows - got {res[0][0]} rows instead of {insert_count}")
exit(1)
print('Finished inserting')
def duckdb_read_thread():
con = duckdb.connect(config={'allow_unsigned_extensions': True})
con.sql(f"ATTACH '{sqlite_db}' AS sqlite_db (TYPE SQLITE, READ_ONLY);")
while True:
time.sleep(0.01)
res = con.sql('SELECT COUNT(*) FROM sqlite_db.users').fetchall()
print(res[0][0])
if res[0][0] == insert_count:
break
try:
os.remove(sqlite_db)
except:
pass
write_thread = threading.Thread(target=sqlite_write_thread)
read_thread = threading.Thread(target=duckdb_read_thread)
write_thread.start()
time.sleep(1)
read_thread.start()
write_thread.join()
read_thread.join()
Which similarly results in either a bus error or a bad access, always in walIndexAppend
in the writing thread.
I will investigate more - in particular I'm curious to see if I can trigger this using a C program that uses only sqlite API calls - but the more I'm looking into it the more this is looking like a problem in SQLite itself - and I'm not entirely sure if the solution actually has a fix other than making sure not to run multiple versions of SQLite in the same process.
Looking around more, and this actually seems to be a known issue in SQLite itself: https://www.sqlite.org/howtocorrupt.html#multiple_copies_of_sqlite_linked_into_the_same_application
Posix advisory locks canceled by a separate thread doing close()
The default locking mechanism used by SQLite on unix platforms is POSIX advisory locking. Unfortunately, POSIX advisory locking has design quirks that make it prone to misuse and failure. In particular, any thread in the same process with a file descriptor that is holding a POSIX advisory lock can override that lock using a different file descriptor. One particularly pernicious problem is that the close() system call will cancel all POSIX advisory locks on the same file for all threads and all file descriptors in the process.
So, for example, suppose a multi-thread process has two or more threads with separate SQLite database connections to the same database file. Then a third thread comes along and wants to read something out of that same database file on its own, without using the SQLite library. The third thread does an open(), a read() and then a close(). One would think this would be harmless. But the close() system call caused the locks held on the database by all the other threads to be dropped. Those other threads have no way of knowing that their locks have just been trashed (POSIX does not provide any mechanism to determine this) and so they keep on running under the assumption that their locks are still valid. This can lead to two or more threads or processes trying to write to the database at the same time, resulting in database corruption.
Note that it is perfectly safe for two or more threads to access the same SQLite database file using the SQLite library. The unix drivers for SQLite know about the POSIX advisory locking quirks and work around them. This problem only arises when a thread tries to bypass the SQLite library and read the database file directly.
Multiple copies of SQLite linked into the same application
As pointed out in the previous paragraph, SQLite takes steps to work around the quirks of POSIX advisory locking. Part of that work-around involves keeping a global list (mutex protected) of open SQLite database files. But, if multiple copies of SQLite are linked into the same application, then there will be multiple instances of this global list. Database connections opened using one copy of the SQLite library will be unaware of database connections opened using the other copy, and will be unable to work around the POSIX advisory locking quirks. A close() operation on one connection might unknowingly clear the locks on a different database connection, leading to database corruption.
The scenario above sounds far-fetched. But the SQLite developers are aware of at least one commercial product that was released with exactly this bug. The vendor came to the SQLite developers seeking help in tracking down some infrequent database corruption issues they were seeing on Linux and Mac. The problem was eventually traced to the fact that the application was linking against two separate copies of SQLite. The solution was to change the application build procedures to link against just one copy of SQLite instead of two.
Have found some other people encountering the same issue when using SQLite - https://github.com/tensorflow/tensorboard/issues/1467
This is a rather gnarly issue. The only way to nicely solve this is to stop bundling SQLite with the SQLite extension, and instead link to the same SQLite that is present for a given platform (e.g. link to libsqlite3-sys
for Rust, the bundled sqlite library for Python, etc). But then not bundling SQLite means the extension is not usable unless the user manually links in SQLite which hurts usability in the general case. We also can't detect if another version of SQLite is present reliably, except unless maybe it has been loaded using RTLD_GLOBAL
in which case we could check the global namespace for SQLite symbols.
In the RTLD_GLOBAL
case we could do something clever where we could rename the symbols in our bundled version to something like bundled_sqlite3_open_v2
, etc. Then we can first check if an existing SQLite installation is present by calling dlopen(nullptr) -> dlsym(main_lib, "sqlite3_open_v2")
. Afterwards we could either use the already present global symbols, or fallback to our bundled implementation if none is present. That might solve the issue at least for platforms where the sqlite symbols are loaded in the global namespace.
Removing the bundled
setting of rusqlite
makes the tests that write with SQLite and read with DuckDB pass (tests 3 and 4 described above): https://github.com/breckcs/duckdb_sqlite_scanner_concurrency_tests/pull/2.
The performance in WAL mode with a SQLite writer and a DuckDB reader seems the same as a SQLite writer and a SQLIte reader (comparing test 2 and 4), but without WAL mode, the DuckDB reader is significantly slower (comparing tests 1 and 3).
I opened a PR to update the DuckDB documentation with considerations for concurrency and included a warning about linking to two copies of the same SQLite library in the same process: https://github.com/duckdb/duckdb-web/pull/2742.
Interestingly, the last two tests that both read and write to SQLite using only DuckDB still don't work:
test_duckdb_writer_duckdb_reader_no_wal
fails with database is locked
errors.test_duckdb_writer_duckdb_reader_with_wal
passes, but runs very slowly and the counters do not update for long periods of time.
What happens?
attempt to write a readonly database
errors.EXC_BAD_ACCESS (SIGBUS)
.database is locked
errors.I am unable to produce similar errors when reading concurrently with a SQLite client.
Six test scenarios are detailed below.
I'm happy to help with further debugging and/or updates to code or documentation related to this issue.
To Reproduce
:white_check_mark: 1) SQLite Writer, SQLite Reader, No WAL
Run the test:
It should pass:
:white_check_mark: 2) SQLite Writer, SQLite Reader, With WAL
Run the test:
It should pass:
:red_circle: 3) SQLite Writer, DuckDB Reader, No WAL
Run the test:
It should fail:
I don't expect DuckDB to be mutating anyting in this test, but even if I remove the
READ_ONLY
from theATTACH
statement, I get a similar error:DuckDB is ultimately using the SQLite client to read, so this scenario should be equivalent to the first test scenario.
Questions:
sqlite3.c
used by both Rusqlite and duckdb-rs need to be binary compatible?:red_circle: 4) SQLite Writer, DuckDB Reader, With WAL
Run the test:
The process crashes with a
EXC_BAD_ACCESS (SIGBUS)
error:If I run it in the debugger, I get the following console output:
The call stack is as follows:
It always fails in the
walIndexAppend
function in thesqlite3.c
client.DuckDB is ultimately using the SQLite client to read, so this scenario should be equivalent to the second test scenario.
Questions:
sqlite3.c
used by both Rusqlite and duckdb-rs need to be binary compatible?I'm using RustRover as my debugger.
:red_circle: 5) DuckDB Writer, DuckDB Reader, No WAL
In an attempt to eliminate binary compatibility issues between the Rusqlite and duckdb-rs clients, I wrote to the SQLite database and read from the SQLite database using only the duckdb-rs client.
Run the test:
It fails with:
There have been other reports of the
database is locked
error. See #78.:red_circle: 6) DuckDB Writer, DuckDB Reader, With WAL
Just for completeness, I also used only the duckdb-rs client to read and write to the SQLite database with WAL model enabled. Note, I don't see a way to create the SQLite database with WAL enabled using the DuckDB client, so the database is first created using the Rusqlite client, then closed and opened with the duckdb-rs client before writing to it.
Run the test:
This fails with:
OS:
Mac OS X arm64
SQLite Version:
3.44.0
DuckDB Version:
0.9.2
DuckDB Client:
duckdb-rs (Rust)
Full Name:
Colin Breck
Affiliation:
Tesla
Have you tried this on the latest
main
branch?Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?