Support to set the number of partitions for write by implementing interface RequiresDistributionAndOrdering . The configurations are
starrocks.write.num.partitions: the number of partitions used for write
starrocks.write.partition.columns: the columns used for hash partition
The repartition is based on hash partition which will introduce shuffle cost. I have not found how to implement a repartition without shuffle like Spark DataSet#coalesce under DataSource V2 API. Improve it in the future if possible.
Checklist:
[ ] I have added test cases for my bug fix or my new feature
[ ] This pr will affect users' behaviors
[ ] This pr needs user documentation (for new or modified features or behaviors)
[ ] I have added documentation for my new feature or new function
What type of PR is this:
Which issues of this PR fixes :
Fixes #
Problem Summary(Required) :
Support to set the number of partitions for write by implementing interface
RequiresDistributionAndOrdering
. The configurations arestarrocks.write.num.partitions
: the number of partitions used for writestarrocks.write.partition.columns
: the columns used for hash partitionThe repartition is based on hash partition which will introduce shuffle cost. I have not found how to implement a repartition without shuffle like Spark DataSet#coalesce under DataSource V2 API. Improve it in the future if possible.
Checklist: