The current code calculates the parallelism to avoid creating too many small files. However, the logic does not work well with the group by batch upload job because its output table is an unpartitioned table. Therefore, it leads really low parallelism like around 10 to write the content.
This PR will use a min value of 200 for write parallelism. It can guarantee a min parallelism for such scenario.
Why / Goal
The goal is to improve the performance for the group by batch upload job writing.
Summary
The current code calculates the parallelism to avoid creating too many small files. However, the logic does not work well with the group by batch upload job because its output table is an unpartitioned table. Therefore, it leads really low parallelism like around 10 to write the content.
This PR will use a min value of 200 for write parallelism. It can guarantee a min parallelism for such scenario.
Why / Goal
The goal is to improve the performance for the group by batch upload job writing.
Test Plan
Checklist
Reviewers
@pkundurthy @hzding621 @yuli-han