databricks / dbt-databricks

A dbt adapter for Databricks.
https://databricks.com
Apache License 2.0
212 stars 114 forks source link

Add support for auto-generating column comments in Delta Live Tables from dbt model descriptions #793

Open Amin-Siddique opened 4 days ago

Amin-Siddique commented 4 days ago

Describe the feature

The feature adds the ability to automatically generate SQL column comments in Databricks Delta Live Tables (DLT). Specifically, it updates the CREATE STREAMING TABLE SQL generation to include column comments using the COMMENT clause, which can be pulled from the dbt column description metadata.

This allows users to define meaningful metadata for their columns directly in dbt models and have those automatically applied as SQL column comments when creating streaming tables in Databricks.

Describe alternatives you've considered

  1. Manual Column Commenting: One alternative is manually adding comments in SQL when defining the table schema. This approach, however, can be prone to inconsistencies and might result in duplicate efforts if the comments are already defined in dbt models.
  2. Post-Creation ALTER Statements: Not supported

Additional context

Here's an example of the newly implemented code:

{%- macro get_create_column_comment(model) -%}
  {% set node = model %}
  {% if node.columns %}
    {% set column_definitions = [] %}
    {% for column_name, column in node.columns.items() %}
      {% set data_type = column.data_type or 'STRING' %} 
      {% set column_line = column_name ~ ' ' ~ data_type %}
      {% if column.description %}
        {% set column_line = column_line ~ ' COMMENT ' ~ "'" ~ column.description ~ "'" %}
      {% endif %}
      {% do column_definitions.append(column_line) %}
    {% endfor %}
    {{ column_definitions | join(',\n    ') }}
  {% endif %}
{%- endmacro -%}

{% macro databricks__get_create_streaming_table_as_sql(relation, sql) -%}
  CREATE STREAMING TABLE {{ relation }} 
    ( {{ get_create_column_comment(config.model) }} )
    -- Additional clauses like partitioning and properties
    AS {{ sql }}
{% endmacro %}

The code leverages dbt metadata (node.columns) to automatically generate column comments during table creation. This ensures consistency and reduces manual overhead for managing schema documentation.

Who will this benefit?

This feature will benefit:

Example Use Case: A team might have a dbt model that defines columns with descriptions. When they deploy this model to a Databricks Delta Live Table, the column comments would automatically be included, helping others understand the purpose and meaning of each column.

Additionally, we can also pull column constraints.

Are you interested in contributing this feature?

Yes, I'm interested in contributing to this feature. Please let me know if additional steps are required to prepare it for contribution or if any changes are needed. I am happy to discuss the implementation further.

benc-db commented 3 days ago

This was initially not implemented due to issues I was having with materialized views and column comments; maybe this is no longer the case, and either way, I think it should work fine with STs. If you are interested in adding this, we would need to follow the pattern we use for other components of MV/ST so that we can detect when the dbt project changes relative to what is stored in Databricks.

Amin-Siddique commented 2 days ago

Thank you @benc-db ! I've submitted a PR with the changes for both ST and MV tables. Since this is my first time contributing to open source, I would really appreciate any guidance or feedback! 😊