duckdb / duckdb-wasm

WebAssembly version of DuckDB
https://shell.duckdb.org
MIT License
1.02k stars 110 forks source link

Asynchronous HTTP reads #1723

Open ravwojdyla opened 2 months ago

ravwojdyla commented 2 months ago

https://github.com/duckdb/duckdb-wasm/issues/381 has some great observations on the sequential nature of the HTTP range requests. We have a largish parquet file ~500MB, with 34 row-groups. It takes less time to download the whole file, than to perform a sequential reads on a small subset of columns. Granted less data is being downloaded in case of range reads. I could not find a dedicated open issue for asynchronous read.

... we're (not yet) fully async during I/O. This has a few far reaching requirements for the query execution model that haven't been been tackled yet. Right now, we're always sitting in a C++ callstack when doing I/O which restricts us to single blocking http reads (via XHR). Threads would offer an escape hatch here but they're immediately bringing up the problems with SharedArrayBuffers and cross-origin-isolation. I'd love to implement the web filesystem using multiple concurrent fetches but that's not quite possible today.

Originally posted by @ankoh in https://github.com/duckdb/duckdb-wasm/issues/381#issuecomment-968179932