aws-samples / dbt-glue

This repository contains the dbt-glue adapter
Apache License 2.0
101 stars 69 forks source link

Fix the case-sensitive comparison on the seed name #436

Closed stephanetrou closed 2 months ago

stephanetrou commented 2 months ago

resolves #429

Description

To reproduce the Bug create a seed file named MySeed.csv in your seed folder with the following content :

adapter,version
glue,1.8.1

Run 2 times : dbt seed

The first time the table did not exist in glue metastore, so the command pass.

❯ dbt seed
14:37:32  Running with dbt=1.8.6
14:37:33  Registered adapter: glue=1.8.1
14:37:33  Found 1 seed, 879 macros
14:37:33
14:37:34  Concurrency: 1 threads (target='dev-glue')
14:37:34
14:37:34  1 of 1 START seed file dbt.MySeed .............................................. [RUN]
14:37:51  1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 16.85s]
14:37:51
14:37:51  Finished running 1 seed in 0 hours 0 minutes and 17.59 seconds (17.59s).
14:37:51
14:37:51  Completed successfully
14:37:51
14:37:51  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

The second time the command did not pass

❯ dbt seed
14:37:58  Running with dbt=1.8.6
14:37:58  Registered adapter: glue=1.8.1
14:37:58  Found 1 seed, 879 macros
14:37:58
14:37:59  Concurrency: 1 threads (target='dev-glue')
14:37:59
14:37:59  1 of 1 START seed file dbt.MySeed .............................................. [RUN]
14:38:03  Glue adapter: Glue returned `error` for statement None for code

csv = [{"adapter": "glue", "version": "1.8.1"}]
df = spark.createDataFrame(csv)
table_name = 'dbt.MySeed'
if (spark.sql("show tables in dbt").where("tableName == 'MySeed'").count() > 0):
    df.write        .mode("overwrite")        .format("parquet")        .insertInto(table_name, overwrite=True)
else:
    df.write        .option("path", "s3://dbt-glue-test-xxxxxxxxxxxx/simple-test/dbt/MySeed")        .format("parquet")        .saveAsTable(table_name)
SqlWrapper2.execute("""select * from dbt.MySeed limit 1""")
, AnalysisException: Table `dbt`.`MySeed` already exists.
14:38:03  1 of 1 ERROR loading seed file dbt.MySeed ...................................... [ERROR in 3.59s]
14:38:03
14:38:03  Finished running 1 seed in 0 hours 0 minutes and 4.26 seconds (4.26s).
14:38:03
14:38:03  Completed with 1 error and 0 warnings:
14:38:03
14:38:03    Database Error in seed MySeed (seeds/MySeed.csv)
  GlueCreateCsvFailed
14:38:03
14:38:03  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

I find in the Athena documentation (sorry not in Glue documentation) Database, table, and column name requirements

Acceptable characters for database names, table names, and column names in AWS Glue must be a UTF-8 string and should be in lower case.

To solve this issue I added lower function in the where statement ("tableName == lower('MySeed')").

dbt seed
14:44:46  Running with dbt=1.8.6
14:44:46  Registered adapter: glue=1.8.1
14:44:47  Found 1 seed, 879 macros
14:44:47
14:44:48  Concurrency: 1 threads (target='dev-glue')
14:44:48
14:44:48  1 of 1 START seed file dbt.MySeed .............................................. [RUN]
14:44:53  1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 5.74s]
14:44:53
14:44:53  Finished running 1 seed in 0 hours 0 minutes and 6.51 seconds (6.51s).
14:44:53
14:44:53  Completed successfully
14:44:53
14:44:53  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
❯ dbt seed
15:02:45  Running with dbt=1.8.6
15:02:45  Registered adapter: glue=1.8.1
15:02:46  Found 1 seed, 879 macros
15:02:46
15:03:04  Concurrency: 1 threads (target='dev-glue')
15:03:04
15:03:04  1 of 1 START seed file dbt.MySeed .............................................. [RUN]
15:04:38  1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 93.38s]
15:04:38
15:04:38  Finished running 1 seed in 0 hours 1 minutes and 51.96 seconds (111.96s).
15:04:38
15:04:38  Completed successfully
15:04:38
15:04:38  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1

Checklist

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sanga8 commented 2 months ago

Nice, working on my side. @moomindani could you please take a look when you have some time ? thanks!

stephanetrou commented 2 months ago

Conflict on CHANGELOG.md fixed.

moomindani commented 2 months ago

Thank you for your contribution :)