delta-io / delta-kernel-rs

A native Delta implementation for integration with any query engine
Apache License 2.0
144 stars 41 forks source link

API to expose how many files were skipped in data skipping #491

Open zachschuermann opened 2 days ago

zachschuermann commented 2 days ago

Ideally during EXPLAIN ANALYZE queries or similar we could configure a mode in which log replay would track all files and therefore allow kernel to expose number and size of all files and number and size of all files included in the scan (or skipped).

Note this should be opt-in since this will have non-trivial cost. Hashmap in log replay is one of the few spots of unbounded memory in kernel. If we suddenly have to track many more files this size increases.