dbt-labs / dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks
https://getdbt.com
Apache License 2.0
395 stars 221 forks source link

Include missing `tblproperties_clause()`. #862

Closed martelli closed 1 year ago

martelli commented 1 year ago

The call to tblproperties_clause() is missing in the "CREATE TABLE" statement building. This is needed in order to be able to use DBT-Spark on top of Iceberg/S3, as we need to pass parameters at table creation time.

Problem

When creating tables on Spark/Iceberg/S3, DBT will not honor the TBLPROPERTIES defined in the config. There are various properties that need to be set at CREATE TABLE time. In my particular case, I need to enable write.object-storage.enabled to avoid being throttled by AWS S3.

Solution

By including the call to tblproperties_clause inside the macro spark__create_table_as, the properties will be included in the CREATE TABLE statement, thus fixing the issue.

Checklist

Test Plan:

By using the following config in the example model:

{{ config(
    materialized='table',
    tblproperties={'write.object-storage.enabled':'true'},
    partition_by=['id'],
    file_format='iceberg'
) }}

It produced:

1: jdbc:hive2://localhost:10000> show tblproperties first_dbt_model_1;
+-------------------------------+----------------------+
|              key              |        value         |
+-------------------------------+----------------------+
| current-snapshot-id           | 6381028787934334351  |
| format                        | iceberg/parquet      |
| format-version                | 1                    |
| write.object-storage.enabled  | true                 |
+-------------------------------+----------------------+
4 rows selected (0.86 seconds)
1: jdbc:hive2://localhost:10000>
cla-bot[bot] commented 1 year ago

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @martelli

etheleon commented 1 year ago

duplicate of https://github.com/dbt-labs/dbt-spark/pull/848

martelli commented 1 year ago

Fixed in #848 .