Closed mattheusv closed 2 months ago
Just some notes for reviewers:
I'm not 100% sure that this the best approach to fix this issue, I've just tried to follow the same approach used on Java and Python implementation, but I don't know if there is a better way to implement in Rust.
Another point is that I'm bit confusing where should I write a test case for this issue?
Thanks for the contribution! Do we need to address this inside scan though? Why let someone build a TableScan
that will always be useless?
This can be handled instead in the code that invokes table.scan()
, without needing to make changes to the scan builder, scan, and context objects just for this edge case.
let scan_builder = table.scan();
// (customize builder here if reqd)...
let Ok(scan) = scan_builder.build() else {
return Ok(stream::empty().boxed());
};
scan().plan_files()
Hi @sdd , thanks for your review.
I'm not sure if I understand your suggestion. I agree that would be better to fix this edge case with a smaller change, but I'm not sure If I understand your suggestion correctly.
The idea would be make the callers of TableScanBuilder.build()
to handle the case where the table don't have any data? The scan_builder.build()
currently returns a TableScan
and the TableScan.plan_files
that actually may return a stream::empty().boxed()
, so I don't know if I'm missing something here? (I'm new on this codebase)
Just adding another idea: would make sense to return an error like Error::new(ErrorKind::EmptyTable)
when calling TableScanBuilder.build()
?
Just to clarify, not having any snapshots is not necessarily the same as not having any data. If there is no current snapshot then there can't be any data, but someone could delete all data from a table, resulting in there being a snapshot, but no data. The existing code would handle this second case just fine - we only need to handle the issue of no snapshots.
@sdd I've changed the code to return a ErrorKind::TableWithoutSnapshot
instead of FeatureUnsupported. With this the user can differentiate a table without snapshots and a table without data. WYT?
We've been very selective when it comes to adding new values to ErrorKind
. I'd personally go for Unexpected
here - but maybe @liurenjie1024 or @Xuanwo can confirm what would be best.
We've been very selective when it comes to adding new values to
ErrorKind
. I'd personally go forUnexpected
here - but maybe @liurenjie1024 or @Xuanwo can confirm what would be best.
Yes, I'm on Unexpected
too, except this error kind is meaningful for users to make decisions.
We've been very selective when it comes to adding new values to
ErrorKind
. I'd personally go forUnexpected
here - but maybe @liurenjie1024 or @Xuanwo can confirm what would be best.Yes, I'm on
Unexpected
too, except this error kind is meaningful for users to make decisions.
@Xuanwo @sdd could you guys please take a look?
Previously TableScan struct was requiring a Snapshot to plan files and for empty tables without a snapshot an error was being returned instead of an empty result.
Following the same approach of Java [0] and Python [1] implementation this commit change the snapshot property to accept None values and the
plan_files
method was also changed to return an empty stream if the snapshot is not present on on PlanContext.[0] https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/SnapshotScan.java#L119 [1] https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/__init__.py#L1979
Fixes: https://github.com/apache/iceberg-rust/issues/580