duckdb / sqlite_scanner

DuckDB extension to read and write to SQLite databases
MIT License
189 stars 19 forks source link

Duckdbvfs: Forward to DuckDB file system via vfs mechanism #66

Open carlopi opened 9 months ago

carlopi commented 9 months ago

Sqlite supports virtual file systems, this is a simple implementation, on top of demovfs, of the duckdb file system.

This allows to read sqlite database, for example over http(s):// or s3://. Fixes #39 and it's a big step towards solving also https://github.com/duckdb/duckdb-wasm/issues/1213.

FROM sqlite_scan('https://raw.githubusercontent.com/duckdblabs/sqlite_scanner/main/data/db/sakila.db', 'film');

Implementation is mostly copy-pasted from demovfs (a demo virtual file system provided with sqlite) to use DuckDB FileHandles and their methods.

Implementation is in src/sqlite/duckdbvfs.cpp, places I modified are the one with mention of duckdb::FileHandle or FIXME.

Performance

One concern is that currently performance for sqlite_attach over the wire is not amazing, like 20 seconds for

CALL sqlite_attach('https://raw.githubusercontent.com/duckdblabs/sqlite_scanner/main/data/db/sakila.db');

and I believe this is due to 2 factors:

Behaviour seems to be the same in native and over the wire, as in the same segments are read in the same order, so I am not sure if that's something that should be solved at at the virtual file system layer or in general.

Local performances seems unchanged, and scan_sqlite performances seems good also over the wire.

carlopi commented 9 months ago

Thanks for the review, it should be possible to have multiple file systems at the same time, I took a shortcut there mostly to increase testing surface, will give it another proper look at it later (and ping Max on the lock).

carlopi commented 8 months ago

Update: I have a branch with most problem solved, last one is the concurrency issue, that's also the most serious.