duckdb / sqlite_scanner

DuckDB extension to read and write to SQLite databases
MIT License
189 stars 19 forks source link

A question about performance #74

Open hs41-18 opened 7 months ago

hs41-18 commented 7 months ago

Hi. Many thanks for all the work you are doing.

I have a question about getting data from SQLite into Duckdb. In SQLite, I have tables/views with a massive amount of rows 10 billion, transactional data. What is the easiest way to copy those tables into Duckdb to perform some analytics?

Nintorac commented 5 months ago

My experience here;

I am using a python arrow UDF to process results from a query against the sqlite3 source. I'm seeing many small vectors of 1 or 2 records going through (as opposed to the 2048 you would expect at full capacity). I have an expensive setup in my UDF so it kills performance.

I wonder if this behavior would match with internal duckdb operations.

I get larger vectors when accessing the sqlite db in order (wrt the rowid), so maybe a copy wouldn't suffer from the performace penalty

Is there a way to control the vector size here?