Describe the feature
I would like dbt-databricks to support Liquid Clustering natively as part of the dbt configuration. Currently, clustering can be achieved via manual configurations, such as partitioning or clustering by a specific column, but there is no direct support for Liquid Clustering or similar dynamic partitioning and file optimization strategies. A built-in method to manage Liquid Clustering within dbt models would simplify the process of optimizing table layouts based on query patterns and improve the developer experience.
Describe alternatives you've considered
Currently, users can manually trigger Databricks' file optimization features (e.g., Z-Ordering or Auto Optimize) using post-hooks within dbt models. While this works, it requires a more manual approach and is not as streamlined as it could be if dbt-databricks supported Liquid Clustering natively. The alternative is to configure and maintain these optimizations directly in Databricks or through Databricks SQL.
Additional context
This feature would integrate Databricks' native Liquid Clustering features into dbt-databricks, allowing for a more seamless workflow. It would also help dbt users take full advantage of the advanced data layout optimizations in Databricks Delta Lake without requiring additional custom post-hook configurations or Databricks SQL commands.
Who will this benefit?
This feature would benefit dbt-databricks users who are working with large-scale data on Databricks and need advanced partitioning and clustering strategies. For example, data engineers managing large datasets (e.g., sales transactions, event logs) who need to optimize query performance based on user access patterns would greatly benefit from having Liquid Clustering as a native dbt configuration option.
Are you interested in contributing this feature?
I am interested in contributing to this feature if needed. Please let me know how I can assist or what guidance you can provide in implementing this functionality.
Describe the feature I would like dbt-databricks to support Liquid Clustering natively as part of the dbt configuration. Currently, clustering can be achieved via manual configurations, such as partitioning or clustering by a specific column, but there is no direct support for Liquid Clustering or similar dynamic partitioning and file optimization strategies. A built-in method to manage Liquid Clustering within dbt models would simplify the process of optimizing table layouts based on query patterns and improve the developer experience.
Describe alternatives you've considered Currently, users can manually trigger Databricks' file optimization features (e.g., Z-Ordering or Auto Optimize) using post-hooks within dbt models. While this works, it requires a more manual approach and is not as streamlined as it could be if dbt-databricks supported Liquid Clustering natively. The alternative is to configure and maintain these optimizations directly in Databricks or through Databricks SQL.
Additional context This feature would integrate Databricks' native Liquid Clustering features into dbt-databricks, allowing for a more seamless workflow. It would also help dbt users take full advantage of the advanced data layout optimizations in Databricks Delta Lake without requiring additional custom post-hook configurations or Databricks SQL commands.
Who will this benefit? This feature would benefit dbt-databricks users who are working with large-scale data on Databricks and need advanced partitioning and clustering strategies. For example, data engineers managing large datasets (e.g., sales transactions, event logs) who need to optimize query performance based on user access patterns would greatly benefit from having Liquid Clustering as a native dbt configuration option.
Are you interested in contributing this feature? I am interested in contributing to this feature if needed. Please let me know how I can assist or what guidance you can provide in implementing this functionality.