https://github.com/duckdb/duckdb-wasm/issues/381 has some great observations on the sequential nature of the HTTP range requests. We have a largish parquet file ~500MB, with 34 row-groups. It takes less time to download the whole file, than to perform a sequential reads on a small subset of columns. Granted less data is being downloaded in case of range reads. I could not find a dedicated open issue for asynchronous read.
... we're (not yet) fully async during I/O. This has a few far reaching requirements for the query execution model that haven't been been tackled yet.
Right now, we're always sitting in a C++ callstack when doing I/O which restricts us to single blocking http reads (via XHR).
Threads would offer an escape hatch here but they're immediately bringing up the problems with SharedArrayBuffers and cross-origin-isolation.
I'd love to implement the web filesystem using multiple concurrent fetches but that's not quite possible today.
https://github.com/duckdb/duckdb-wasm/issues/381 has some great observations on the sequential nature of the HTTP range requests. We have a largish parquet file ~500MB, with 34 row-groups. It takes less time to download the whole file, than to perform a sequential reads on a small subset of columns. Granted less data is being downloaded in case of range reads. I could not find a dedicated open issue for asynchronous read.
Originally posted by @ankoh in https://github.com/duckdb/duckdb-wasm/issues/381#issuecomment-968179932