apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.46k stars 966 forks source link

[flink] Fix query service error while query table with deletion vectors option #4301

Open Shadowell opened 1 month ago

Shadowell commented 1 month ago

Purpose

Test query service with deletion vectors options, its okay for table with option 'deletion-vectors.enabled' = 'true'

Linked issue: close #4265

Tests

Add test testQueryServiceWithDeletionVectors

API and Format

No

Documentation

No

herefree commented 1 month ago

In my test,CALL sys.query_service('default.DIM', 1) can be submitted to the flink cluster,but it will always restart due to exceptions.I don’t know if you have actually tested it on the flink cluster.

java.lang.IllegalStateException: SortedRun is not sorted and may contain overlapping key intervals. This is a bug. at org.apache.paimon.utils.Preconditions.checkState(Preconditions.java:182) at org.apache.paimon.mergetree.SortedRun.validate(SortedRun.java:90) at org.apache.paimon.mergetree.SortedRun.fromUnsorted(SortedRun.java:67) at org.apache.paimon.mergetree.Levels.updateLevel(Levels.java:190)

herefree commented 1 month ago

a2926d01-d72a-4229-955f-f8a32024c537 d33d21da-49e0-42a8-aa2c-cda53265e0a2

Shadowell commented 1 month ago

Hi @herefree ,I test it in local standalone cluster, it does throw this exception. Due to add option 'deletion-vectors.enabled' = 'true', the file level is upgrade from level-0 to level-5 duration compaction. So duration process element, the file was added to sortRun file again which lead to file duplicate and that cause validate sortRun failed. cc @JingsongLi

Add I find another bug duration test. I only insert one record, but query service show process 2 records. And I will raise another issue to follow this. image image

herefree commented 1 month ago

Hi @herefree ,I test it in local standalone cluster, it does throw this exception. Due to add option 'deletion-vectors.enabled' = 'true', the file level is upgrade from level-0 to level-5 duration compaction. So duration process element, the file was added to sortRun file again which lead to file duplicate and that cause validate sortRun failed. cc @JingsongLi

Add I find another bug duration test. I only insert one record, but query service show process 2 records. And I will raise another issue to follow this. image image

Thanks your reply, look forward to your repair~