misc(export-iceberg) Iceberg export disable automatic shuffle while appending data to table - Githubissues

filodb / FiloDB

Distributed Prometheus time series database

Apache License 2.0

1.43k stars 225 forks source link

misc(export-iceberg) Iceberg export disable automatic shuffle while appending data to table #1731

Closed nikitag55 closed 7 months ago

nikitag55 commented 7 months ago

Pull Request checklist

[ ] The commit(s) message(s) follows the contribution guidelines ?
[ ] Tests for the changes have been added (for bug fixes / features) ?
[ ] Docs have been added / updated (for bug fixes / features) ?

Current behavior : (link exiting issues here : https://help.github.com/articles/basic-writing-and-formatting-syntax/#referencing-issues-and-pull-requests)

Currently, "distribution-mode": "hash" is the new default and requests that Spark uses a hash-based exchange to shuffle the incoming write data before writing. Practically, this means that each row is hashed based on the row's partition value and then placed in a corresponding Spark task based upon that value.

New behavior : Setting "distribution-mode": "none", does not request any shuffles or sort to be performed automatically by Spark.