aws-samples / dbt-glue

This repository contains the dbt-glue adapter
Apache License 2.0
102 stars 69 forks source link

Spark conf order has an impact #251

Closed sanga8 closed 1 year ago

sanga8 commented 1 year ago

Describe the bug

The order of spark conf impacts the run

Steps To Reproduce

profiles.yml

      conf: |
        --conf spark.sql.legacy.allowNonEmptyLocationInCTAS=true
        --conf spark.sql.catalog.glue_catalog.warehouse=s3://mys3
        --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
        --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
        --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions

This conf produces this error:

AnalysisException: CREATE-TABLE-AS-SELECT cannot create table with location to a non-empty directory s3://mys3/my_db/my_table . To allow overwriting the existing non-empty directory, set 'spark.sql.legacy.allowNonEmptyLocationInCTAS' to true.

Now we put the first conf in the last position, and it resolves our error:

      conf: |
        --conf spark.sql.catalog.glue_catalog.warehouse=s3://mys3
        --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog
        --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
        --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
        --conf spark.sql.legacy.allowNonEmptyLocationInCTAS=true

Expected behavior

Order does not impact ?

System information

dbt-glue:1.6.5 python: 3.9.11

itsAlexK commented 1 year ago

I believe the first conf parameter needs to be set without using --conf

i.e https://stackoverflow.com/questions/55523705/how-do-i-set-multiple-conf-table-parameters-in-aws-glue

sanga8 commented 1 year ago

Thank you I'll close this issue.