StarRocks / starrocks-connector-for-apache-spark

Apache License 2.0
36 stars 53 forks source link

[Feature] Support to set the number of partitions for write #35

Closed banmoy closed 1 year ago

banmoy commented 1 year ago

What type of PR is this:

Which issues of this PR fixes :

Fixes #

Problem Summary(Required) :

Support to set the number of partitions for write by implementing interface RequiresDistributionAndOrdering . The configurations are

The repartition is based on hash partition which will introduce shuffle cost. I have not found how to implement a repartition without shuffle like Spark DataSet#coalesce under DataSource V2 API. Improve it in the future if possible.

Checklist: