apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
474 stars 97 forks source link

Refactor: Extract `partition_filters` from `ManifestEvaluator` #360

Closed marvinlanhenke closed 2 months ago

marvinlanhenke commented 2 months ago

Which issue does this PR close?

Closes #359

Rationale for this change

The partition_filter (inclusive projection) is not only required by the ManifestEvaluator but also by the (incoming) ExpressionEvaluator. In order to avoid duplicate code as well as unnecessary computation of the partition filters, we should extract the computation of the partition filters and cache the results per partition_spec_id.

With this refactor we (hopefully) should be able to integrate the ExpressionEvaluator and the InclusiveMetricsEvaluator more easily.

What changes are included in this PR?

Are these changes tested?

Yes. Unit tests are included. If PR is okay; I will add some basis tests for the new structs (FileScanStreamContext, etc.) as well.

marvinlanhenke commented 2 months ago

@sdd @Fokko @liurenjie1024 PTAL

marvinlanhenke commented 2 months ago

@Fokko thanks for clearing this up and thanks for the review. This was is the line that led me to believe we also use the inclusive projection in the expression evaluator.

Since I also wanted to tackle the impl of the ExpressionEvaluator I'd be grateful if you could tell me where I went wrong? My understanding is that the partition_expr is passed into the ExpressionEvaluator and that the expr is taken from the DefaultDict of partition filters which contains the inclusive projections?

However, if I understand correctly we still needed to extract the partition filters since they will be needed for positional deletes.

Fokko commented 2 months ago

Since I also wanted to tackle the impl of the ExpressionEvaluator I'd be grateful if you could tell me where I went wrong? My understanding is that the partition_expr is passed into the ExpressionEvaluator and that the expr is taken from the DefaultDict of partition filters which contains the inclusive projections?

You're right, we need both of them 👍

However, if I understand correctly we still needed to extract the partition filters since they will be needed for positional deletes.

Yes, we need it there as well :) Let's move this forward. Thanks @marvinlanhenke for working on this, and @sdd for the prompt review 🙌

marvinlanhenke commented 2 months ago

@sdd thanks for the review and @Fokko thanks for merging this.

There is still an unresolved issue though about "not caching" the partition_schemas. I'll create an issue for that; so we can discuss the approach first; Since I defenitely could net some help here in terms of idiomatic Rust.