cloudfuse-io / buzz-rust

Serverless query engine
MIT License
140 stars 11 forks source link

Optimize parquet chunk downloading strategy #10

Open rdettai opened 3 years ago

rdettai commented 3 years ago

The parquet table downloads each column chunk individually. If a large proportion of the columns are used and there is a large number of row groups in the file, this implies many small downloads.

A strategy could be implemented to group the downloads of column chunks if