apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.41k stars 945 forks source link

[Bug] Dedicated Compaction for write-only table cann't clean up small files after running for a long time #2675

Open bridgeDream opened 10 months ago

bridgeDream commented 10 months ago

Search before asking

Paimon version

0.6

Compute Engine

flink 1.16

Minimal reproduce step

  1. start a flink job for writing to paimon table with mode "write-only" with checkpoint as 5s
  2. start a Dedicated Compaction flink to to compact paimon table
  3. after running over 1 days
  4. I found some small files yesterday still exist

What doesn't meet your expectations?

When writing and compact job runs for more than 2 days, I found small files with timestamp as "2023-12-28 19:43:36" still exist in 2023-12-29。 image

Anything else?

No response

Are you willing to submit a PR?

wg1026688210 commented 10 months ago

Hi~ @bridgeDream did you set the snapshot expiration config.

bridgeDream commented 10 months ago

Hi~ @bridgeDream did you set the snapshot expiration config. @wg1026688210

no, just using default config; image

AnemoneIndicum commented 9 months ago

I also have same problems as well.

wg1026688210 commented 9 months ago

You can try cleaning up the orphan file and confirming whether this file can be cleaned up. If it is not cleaned up, it is possible that the snapshot corresponding to this file has not expired.

dierbei commented 7 months ago

@bridgeDream @AnemoneIndicum I had the same problem, did you guys solve it?