apache / iceberg-rust

Apache Iceberg
https://rust.iceberg.apache.org/
Apache License 2.0
675 stars 159 forks source link

scan: change ErrorKind when table dont have spanshots #608

Closed mattheusv closed 2 months ago

mattheusv commented 2 months ago

Previously TableScan struct was requiring a Snapshot to plan files and for empty tables without a snapshot an error was being returned instead of an empty result.

Following the same approach of Java [0] and Python [1] implementation this commit change the snapshot property to accept None values and the plan_files method was also changed to return an empty stream if the snapshot is not present on on PlanContext.

[0] https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotScan.java#L119 [1] https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1979

Fixes: https://github.com/apache/iceberg-rust/issues/580

mattheusv commented 2 months ago

Just some notes for reviewers:

I'm not 100% sure that this the best approach to fix this issue, I've just tried to follow the same approach used on Java and Python implementation, but I don't know if there is a better way to implement in Rust.

Another point is that I'm bit confusing where should I write a test case for this issue?

sdd commented 2 months ago

Thanks for the contribution! Do we need to address this inside scan though? Why let someone build a TableScan that will always be useless?

This can be handled instead in the code that invokes table.scan(), without needing to make changes to the scan builder, scan, and context objects just for this edge case.


let scan_builder = table.scan();
// (customize builder here if reqd)...

let Ok(scan) = scan_builder.build() else {
    return Ok(stream::empty().boxed());
};

scan().plan_files()
mattheusv commented 2 months ago

Hi @sdd , thanks for your review.

I'm not sure if I understand your suggestion. I agree that would be better to fix this edge case with a smaller change, but I'm not sure If I understand your suggestion correctly.

The idea would be make the callers of TableScanBuilder.build() to handle the case where the table don't have any data? The scan_builder.build() currently returns a TableScan and the TableScan.plan_files that actually may return a stream::empty().boxed(), so I don't know if I'm missing something here? (I'm new on this codebase)

Just adding another idea: would make sense to return an error like Error::new(ErrorKind::EmptyTable) when calling TableScanBuilder.build()?

sdd commented 2 months ago

Just to clarify, not having any snapshots is not necessarily the same as not having any data. If there is no current snapshot then there can't be any data, but someone could delete all data from a table, resulting in there being a snapshot, but no data. The existing code would handle this second case just fine - we only need to handle the issue of no snapshots.

mattheusv commented 2 months ago

@sdd I've changed the code to return a ErrorKind::TableWithoutSnapshot instead of FeatureUnsupported. With this the user can differentiate a table without snapshots and a table without data. WYT?

sdd commented 2 months ago

We've been very selective when it comes to adding new values to ErrorKind. I'd personally go for Unexpected here - but maybe @liurenjie1024 or @Xuanwo can confirm what would be best.

Xuanwo commented 2 months ago

We've been very selective when it comes to adding new values to ErrorKind. I'd personally go for Unexpected here - but maybe @liurenjie1024 or @Xuanwo can confirm what would be best.

Yes, I'm on Unexpected too, except this error kind is meaningful for users to make decisions.

mattheusv commented 2 months ago

We've been very selective when it comes to adding new values to ErrorKind. I'd personally go for Unexpected here - but maybe @liurenjie1024 or @Xuanwo can confirm what would be best.

Yes, I'm on Unexpected too, except this error kind is meaningful for users to make decisions.

@Xuanwo @sdd could you guys please take a look?