[Feature] Optimize heap memory usage during full compaction of manifest files

codeTai commented 1 week ago

Search before asking

[X] I searched in the issues and found nothing similar.

Motivation

When submitting a snapshot triggers a full compaction of the manifest file, we hope to reduce the usage of the taskManager heap memory.

Solution

Based on the background that writing HDFS files is slow but reading HDFS files is fast, the code logic is optimized to avoid reading multiple manifest files at the same time and accumulating data in the memory.

Anything else?

Part of the debug log：

Heap memory usage before optimization：

Heap memory usage after optimization：

Are you willing to submit a PR?

[X] I'm willing to submit a PR!

JingsongLi commented 1 week ago

Hi @codeTai , I created #3598 for using ScanParallelExecutor.parallelismBatchIterable, can you validate this PR in your testing env?

JingsongLi commented 1 week ago

@codeTai If #3598 cannot solve your problem, please create a new pull request.

codeTai commented 1 week ago

Ok, I'll test it later.

codeTai commented 1 week ago

ScanParallelExecutor.parallelismBatchIterable can solve my problem, thanks.

apache / paimon