apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
474 stars 97 forks source link

Concurrent table scans #373

Open sdd opened 1 month ago

sdd commented 1 month ago

This is a bit of an experiment to see how things could look if we tried to:

I'd like to add some unit tests to confirm that this behaves as expected beyond the existing tests that we have for TableScan, and add an integration / performance test that can quantify any performance improvements (or regressions 😅 ) that we get from these changes.

Let me know what you all think.

sdd commented 1 month ago

I've updated this to ditch the concurrency when processing ManifestEntry items within a single Manifest, producing them asynchronously but sequentially instead. I've kept the limited concurrency when processing ManifestFiles within the scan's snapshot's ManifestList.

I've kept the approach of using an mpsc channel with a spawned task, with that task using try_for_each_concurrent to achieve the concurrency. This is because without the channel and spawned task, we'd need to use an async closure, which is unstable rust. With the spawned task we only need to use an async block, which is in stable rust.