filodb / FiloDB

Distributed Prometheus time series database
Apache License 2.0
1.43k stars 225 forks source link

misc(iceberg-export): sort within partitions before writing the data to table #1733

Closed nikitag55 closed 7 months ago

nikitag55 commented 7 months ago

Pull Request checklist

Current behavior :

New behavior :

Trying to sortWithinPartitions(), to return a new Dataset with each partition sorted by the given expressions (partition columns) before writing to the table as mentioned above "the data must be manually sorted by partition value. The data must be sorted either within each spark task, or globally within the entire dataset"

alextheimer commented 7 months ago

Docs linked for reference: https://iceberg.apache.org/docs/latest/spark-writes/#writing-distribution-modes