Closed codeTai closed 1 week ago
Hi @codeTai , I created #3598 for using ScanParallelExecutor.parallelismBatchIterable
, can you validate this PR in your testing env?
@codeTai If #3598 cannot solve your problem, please create a new pull request.
Ok, I'll test it later.
ScanParallelExecutor.parallelismBatchIterable can solve my problem, thanks.
Search before asking
Motivation
When submitting a snapshot triggers a full compaction of the manifest file, we hope to reduce the usage of the taskManager heap memory.
Solution
Based on the background that writing HDFS files is slow but reading HDFS files is fast, the code logic is optimized to avoid reading multiple manifest files at the same time and accumulating data in the memory.
Anything else?
Part of the debug log:![image](https://github.com/apache/paimon/assets/20182632/b6e3a7ca-819d-4c98-9500-a06c516729b7)
Heap memory usage before optimization:
Heap memory usage after optimization:![image](https://github.com/apache/paimon/assets/20182632/533a7fdf-405b-4474-9c69-b633b4224b9e)
Are you willing to submit a PR?