databrickslabs / tempo

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation
https://pypi.org/project/dbl-tempo
Other
307 stars 52 forks source link

Remove partition columns from Z Order optimization in `io.py` #244

Open R7L208 opened 2 years ago

R7L208 commented 2 years ago

In io.py we Z ORDER on partitionCols + optimizationCols when useDeltaOpt is True. Since we can partition prune without Z Ordering on partition columns, I believe it makes sense to remove them from the Z Order clause to only optimize on optimizationCols if they are provided.

Is there another advantage to including partition columns within Z ORDER for time series other than data skipping?

rportilla-databricks commented 2 years ago

We should be able to remove partition columns.

On Mon, Aug 15, 2022 at 11:41 AM Lorin Dawson @.***> wrote:

In io.py we Z ORDER on partitionCols + optimizationCols when useDeltaOpt is True. Since we can partition prune without Z Ordering on partition columns, I believe it makes sense to remove them from the Z Order clause to only optimize on optimizationCols if they are provided.

Is there another advantage to including partition columns within Z ORDER for time series other than data skipping?

— Reply to this email directly, view it on GitHub https://github.com/databrickslabs/tempo/issues/244, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJCRAXABDIJ75JM6UKPSXWLVZJQLLANCNFSM56SUGQBQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

--

Ricardo Portilla

Industry Vertical Lead - Financial Services, Ph.D

Databricks Inc.

@.***

databricks.com

rportilla-databricks commented 1 year ago

This will be resolved when streaming AS OF joins are merged.