[X] I have searched in the issues and found no similar issues.
Describe the feature
In current codebase, for huge partition, before it marked as huge partition,it will be reserved in the memory if having enough capacity. But when it is marked as huge partition, then it should be flushed into the HDFS, if this is specified.
In this first flushing process of this huge partition, it sometimes will be large, especially with the huge buffer capacity. And this will be slow because it is a huge flush event, which is not benifited from the concurrency hdfs partition writing mechanism.
And it will occupy memory space before this flush is finished, and will make the client backpressure.
From this point, the smaller flush event is better for shuffle-server throughout. But the local IO hope the big flush data buffer, which is a trade off.
Anyway, the huge partition huge flush event splited into multi small events to improve writing performance is useful.
Code of Conduct
Search before asking
Describe the feature
In current codebase, for huge partition, before it marked as huge partition,it will be reserved in the memory if having enough capacity. But when it is marked as huge partition, then it should be flushed into the HDFS, if this is specified.
In this first flushing process of this huge partition, it sometimes will be large, especially with the huge buffer capacity. And this will be slow because it is a huge flush event, which is not benifited from the concurrency hdfs partition writing mechanism. And it will occupy memory space before this flush is finished, and will make the client backpressure.
From this point, the smaller flush event is better for shuffle-server throughout. But the local IO hope the big flush data buffer, which is a trade off.
Anyway, the huge partition huge flush event splited into multi small events to improve writing performance is useful.
Motivation
No response
Describe the solution
No response
Additional context
No response
Are you willing to submit PR?