Closed Acehaidrey closed 1 day ago
Thank you, @Acehaidrey, for reporting this issue! It seems that the method buildChangelogScan()
does not properly set up the scan when the startTimestamp is newer than the timestamp of the current snapshot. Adding the following logic at this line should address the problem:
if (startTimestamp != null && table.currentSnapshot().timestampMillis() < startTimestamp) {
emptyScan = true;
}
However, I believe there's an opportunity to revisit the buildChangelogScan()
method more holistically. Refactoring it could improve its clarity and make it less prone to errors in the future. I'd be happy to collaborate or discuss further ideas for restructuring the method. Let me know what you think!
Hi @bryanck, do we still have time to include this bug fix in 1.7.1?
The 1.7.1 ship has sailed, but if this is a regression or high priority, we can try to get it in.
It is a regression, better to get it in. @Acehaidrey should be able to provide a quick fix soon, like today. Appreciate if you can hold 1.7.1 a bit more, @bryanck.
@Acehaidrey
Execution failed for task ':iceberg-spark:iceberg-spark-extensions-3.5_2.12:spotlessJavaCheck'.
> The following files had format violations:
src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java
@@ -460,7 +460,6 @@
····//········[4,·"d",·"INSERT",·3,·7807948732874377651L]]
····assertThat(results).as("Num·records·must·be·zero").isEmpty();
-
····//·Clean·up·the·changelog·view
····sql("DROP·VIEW·IF·EXISTS·test_changelog_view");
··}
@flyrain + @Acehaidrey We expect this test to fail right? Like we need to still add a fix to this pr?
We expect this test to fail right? Like we need to still add a fix to this pr?
That's correct!
Yes I can work on this in about one hour if that’s okay
On Wed, Nov 20, 2024 at 11:59 AM Yufei Gu @.***> wrote:
We expect this test to fail right? Like we need to still add a fix to this pr?
That's correct!
— Reply to this email directly, view it on GitHub https://github.com/apache/iceberg/pull/11564#issuecomment-2489118050, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPNAR5NCCP65NCVQ32VLCD2BS5YVAVCNFSM6AAAAABR43PBQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIOBZGEYTQMBVGA . You are receiving this because you were mentioned.Message ID: @.***>
Hey @sfc-gh-ygu @flyrain @RussellSpitzer @bryanck I have gone ahead and updated this . Please if you can take a look - the test passes now as do the other tests.
Sorry for delays, home had a water leak today, and been a mess.
Fixed! Sorry I missed that
On Wed, Nov 20, 2024 at 7:10 PM Yufei Gu @.***> wrote:
@.**** commented on this pull request.
In spark/v3.5/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestChangelogTable.java https://github.com/apache/iceberg/pull/11564#discussion_r1851144167:
- " 'start-timestamp', '%d',"
- " 'end-timestamp', '%d'"
- " ),"
- " changelog_view => 'test_changelog_view'"
- ")",
- catalogName, tableName, startTime, endTime);
- // Query the changelog view
- List<Object[]> results =
- sql(
- "SELECT * FROM test_changelog_view WHERE _change_type IN ('INSERT', 'DELETE') ORDER BY _change_ordinal");
- // Verify no changes are returned since our window is after the inserts
- assertThat(results).as("Num records must be zero").isEmpty();
nit: remove the extra empty line?
— Reply to this email directly, view it on GitHub https://github.com/apache/iceberg/pull/11564#pullrequestreview-2449967049, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPNAR6LLRHMGSZZRF5QD732BUQGXAVCNFSM6AAAAABR43PBQCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDINBZHE3DOMBUHE . You are receiving this because you were mentioned.Message ID: @.***>
thank you @flyrain for all the help here - I actually took your advice, think it looks cleaner this way if you dont mind seeing once more
Thank you, I had to fix a formatting issue so pushed another update
@Acehaidrey @flyrain
The bug was due to we were not checking the timestamp of endSnapshot
calculated from endTimestamp
is less than the startTimestamp
. I've submitted #11612 to make the logic easier to reason about, hopefully.
Fix changelog table bug for start time older than current snapshot
Problem
After upgrading to Iceberg 1.5 and Spark 3.5.1, the
create_changelog_view
procedure occasionally returns records that were inserted outside (before) the specified time range. Specifically:Changes
Added a unit test
testChangelogViewOutsideTimeRange
toTestChangelogTable
that:Testing
To invoke the unit test:
./gradlew :iceberg-spark:iceberg-spark-extensions-3.5_2.12:test --tests "org.apache.iceberg.spark.extensions.TestChangelogTable"
Related Issue
TBD
Additional Context