apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
874 stars 290 forks source link

[Bug]: Cyclic rewrite of a single pos-delete file #3294

Closed XBaith closed 3 weeks ago

XBaith commented 1 month ago

What happened?

Optimizer in only one Datafile and pos-delete file need to be rewritten, it may happen that the pos-delete is deleted after being rewritten over and over again: the specific RewriteInput case is as follows: image The file: s3://wap-udp-calling-prod-useast1/wap_ci_usa_users_history/data/orgid_bucket=5/00036-657-872c6863-916b-46c5-911d-ad418d93570c-00001-deletes.parquet will be rewrite a new pos-delete file and then remove in next snapshot.

image image

Affects Versions

master/0.7.x

What table formats are you seeing the problem on?

Iceberg

What engines are you seeing the problem on?

Optimizer

How to reproduce

No response

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

Code of Conduct