databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
226 stars 119 forks source link

Liquid clustering databricks feature request #809

Closed Ram-Dev7 closed 1 month ago

Ram-Dev7 commented 1 month ago

Describe the feature I would like dbt-databricks to support Liquid Clustering natively as part of the dbt configuration. Currently, clustering can be achieved via manual configurations, such as partitioning or clustering by a specific column, but there is no direct support for Liquid Clustering or similar dynamic partitioning and file optimization strategies. A built-in method to manage Liquid Clustering within dbt models would simplify the process of optimizing table layouts based on query patterns and improve the developer experience.

Describe alternatives you've considered Currently, users can manually trigger Databricks' file optimization features (e.g., Z-Ordering or Auto Optimize) using post-hooks within dbt models. While this works, it requires a more manual approach and is not as streamlined as it could be if dbt-databricks supported Liquid Clustering natively. The alternative is to configure and maintain these optimizations directly in Databricks or through Databricks SQL.

Additional context This feature would integrate Databricks' native Liquid Clustering features into dbt-databricks, allowing for a more seamless workflow. It would also help dbt users take full advantage of the advanced data layout optimizations in Databricks Delta Lake without requiring additional custom post-hook configurations or Databricks SQL commands.

Who will this benefit? This feature would benefit dbt-databricks users who are working with large-scale data on Databricks and need advanced partitioning and clustering strategies. For example, data engineers managing large datasets (e.g., sales transactions, event logs) who need to optimize query performance based on user access patterns would greatly benefit from having Liquid Clustering as a native dbt configuration option.

Are you interested in contributing this feature? I am interested in contributing to this feature if needed. Please let me know how I can assist or what guidance you can provide in implementing this functionality.

benc-db commented 1 month ago

I'm uncertain what you're asking for...we support liquid clustering with liquid_clustered_by

Ram-Dev7 commented 1 month ago

Oh it's already available, please share related docs to refer, thanks

jtmcn commented 1 month ago

@Ram-Dev7 documentation