apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
673 stars 159 forks source link

fix: do not sort indices for `ProjectionMask::leaves` #682

Closed wcy-fdu closed 3 weeks ago

wcy-fdu commented 4 weeks ago

The order of indices passed into ProjectionMask::leaves is meaningless, because "repeated or out of order indices will not impact the final mask"(refer to https://docs.rs/crate/parquet/latest/source/src/arrow/mod.rs#179). Therefore, we don't need to perform an extra sort, which increases the overhead.

wcy-fdu commented 4 weeks ago

cc @liurenjie1024 @Xuanwo for awareness.

liurenjie1024 commented 4 weeks ago

Looks ut failed, please help to fix it.

sdd commented 3 weeks ago

@liurenjie1024 or @Xuanwo are you able to re-run the tests on this? I can't explain why the test would fail the way that it has in CI and when I reproduce the change from this PR on main locally, the test passes

liurenjie1024 commented 3 weeks ago

@liurenjie1024 or @Xuanwo are you able to re-run the tests on this? I can't explain why the test would fail the way that it has in CI and when I reproduce the change from this PR on main locally, the test passes

Hi, @sdd I can reproduce the test failure on my mac. Let me try to rereun with update branch.

sdd commented 3 weeks ago

Strange, I was running it on M1 Mac too and it all passed! Oh well, glad you can repro

liurenjie1024 commented 3 weeks ago

cc @wcy-fdu Would you mind to take a look at the ut failure?

wcy-fdu commented 3 weeks ago

cc @wcy-fdu Would you mind to take a look at the ut failure?

Sure, will fix ut outside of work time.

wcy-fdu commented 3 weeks ago

After investigation, I found that although ProjectionMask::leaves() does not require column_indices to be in order, creating PredicateConverter requires it to be in order, so it has to be sorted here anyway. Therefore, it seems unnecessary to remove sort.