apache / amoro

Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
https://amoro.apache.org/
Apache License 2.0
874 stars 290 forks source link

[Improvement]: Candidates for expired snapshot timestamp need to be adjusted #3322

Open XBaith opened 2 weeks ago

XBaith commented 2 weeks ago

Search before asking

What would you like to be improved?

I've had users in production mention that orphaned files are not being cleaned up properly.

Upon further inspection it was found that it was because the historical snapshots were not being deleted as expected, and the reason they were not being deleted was because the table contained snapshots that were written by flink, but the table was no longer being consistently written to by flink.

So the most recent snapshot written by flink will not be cleaned up by amoro, even if it is much earlier than the set snapshot retention time.

How should we improve?

No response

Are you willing to submit PR?

Subtasks

No response

Code of Conduct