Currently, "distribution-mode": "hash" is the new default and requests that Spark uses a hash-based exchange to shuffle the incoming write data before writing. Practically, this means that each row is hashed based on the row's partition value and then placed in a corresponding Spark task based upon that value.
New behavior :
Setting "distribution-mode": "none", does not request any shuffles or sort to be performed automatically by Spark.
Pull Request checklist
Current behavior : (link exiting issues here : https://help.github.com/articles/basic-writing-and-formatting-syntax/#referencing-issues-and-pull-requests)
"distribution-mode": "hash"
is the new default and requests that Spark uses a hash-based exchange to shuffle the incoming write data before writing. Practically, this means that each row is hashed based on the row's partition value and then placed in a corresponding Spark task based upon that value.New behavior : Setting
"distribution-mode": "none"
, does not request any shuffles or sort to be performed automatically by Spark.