Open dheemanthgowda opened 1 month ago
Thanks for raising @dheemanthgowda . Can you also update the subject please.
There is one more issue raised before which explains your issue also. - https://github.com/apache/hudi/issues/11436
@dheemanthgowda Thanks for the feedback, it looks like your table does not have partitioning fields, then each compaction would triger a whole table rewrite which is indeed costly for streaming ingestion. Did you try to move the compaction out as a separate job.
@dheemanthgowda Were you able to check on it more by using aync compaction?
Describe the problem you faced
We were experiencing slow upsert performance when using Hudi with Flink SQL on AWS S3. Tried enabling metadata table, which improved update speed, but the cleaner is not triggering even after 3 commits.
To Reproduce
Steps to reproduce the behavior:
Configure Hudi with the following settings for upserting data via Flink SQL:
Run a batch job to perform upserts. Monitor logs for cleaning operations. Expected behavior
We expect the cleaner to trigger and remove older commits as per the defined configuration.
Environment Description
Hudi version: 1.14.1 Flink version: 1.17.1 Storage (HDFS/S3/GCS..): S3 Running on Docker? (yes/no): running on K8s Additional context After enabling metadata.enabled to true, we observed a notable improvement in upsert speed. However, the cleaner does not seem to be functioning as expected. Are we missing any configs?