This PR adapts our DeleteFileIndexBenchmark for DVs.
Benchmark (type) Mode Cnt Score Error Units
DeleteFileIndexBenchmark.buildIndexAndLookup partition ss 10 0.475 ± 0.031 s/op
DeleteFileIndexBenchmark.buildIndexAndLookup file ss 10 5.381 ± 0.224 s/op
DeleteFileIndexBenchmark.buildIndexAndLookup dv ss 10 3.612 ± 0.201 s/op
The reason partition-scoped deletes are fastest is because the benchmark sets up a table with a small number of deep partitions (50K data files per partition) and only 100 delete files per partition. Therefore, the number of delete files differs dramatically. We should probably make this benchmark more representative in the future. DVs are faster than file-scoped deletes because they rely on referencedDataFile instead of reconstructing that value from bounds. I'd say the planning performance is acceptable for 2.5M DVs, but we may want to further optimize it.
This PR adapts our
DeleteFileIndexBenchmark
for DVs.The reason partition-scoped deletes are fastest is because the benchmark sets up a table with a small number of deep partitions (50K data files per partition) and only 100 delete files per partition. Therefore, the number of delete files differs dramatically. We should probably make this benchmark more representative in the future. DVs are faster than file-scoped deletes because they rely on
referencedDataFile
instead of reconstructing that value from bounds. I'd say the planning performance is acceptable for 2.5M DVs, but we may want to further optimize it.This work is part of #11122.