Closed stephanetrou closed 2 months ago
resolves #429
To reproduce the Bug create a seed file named MySeed.csv in your seed folder with the following content :
MySeed.csv
adapter,version glue,1.8.1
Run 2 times : dbt seed
dbt seed
The first time the table did not exist in glue metastore, so the command pass.
❯ dbt seed 14:37:32 Running with dbt=1.8.6 14:37:33 Registered adapter: glue=1.8.1 14:37:33 Found 1 seed, 879 macros 14:37:33 14:37:34 Concurrency: 1 threads (target='dev-glue') 14:37:34 14:37:34 1 of 1 START seed file dbt.MySeed .............................................. [RUN] 14:37:51 1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 16.85s] 14:37:51 14:37:51 Finished running 1 seed in 0 hours 0 minutes and 17.59 seconds (17.59s). 14:37:51 14:37:51 Completed successfully 14:37:51 14:37:51 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
The second time the command did not pass
❯ dbt seed 14:37:58 Running with dbt=1.8.6 14:37:58 Registered adapter: glue=1.8.1 14:37:58 Found 1 seed, 879 macros 14:37:58 14:37:59 Concurrency: 1 threads (target='dev-glue') 14:37:59 14:37:59 1 of 1 START seed file dbt.MySeed .............................................. [RUN] 14:38:03 Glue adapter: Glue returned `error` for statement None for code csv = [{"adapter": "glue", "version": "1.8.1"}] df = spark.createDataFrame(csv) table_name = 'dbt.MySeed' if (spark.sql("show tables in dbt").where("tableName == 'MySeed'").count() > 0): df.write .mode("overwrite") .format("parquet") .insertInto(table_name, overwrite=True) else: df.write .option("path", "s3://dbt-glue-test-xxxxxxxxxxxx/simple-test/dbt/MySeed") .format("parquet") .saveAsTable(table_name) SqlWrapper2.execute("""select * from dbt.MySeed limit 1""") , AnalysisException: Table `dbt`.`MySeed` already exists. 14:38:03 1 of 1 ERROR loading seed file dbt.MySeed ...................................... [ERROR in 3.59s] 14:38:03 14:38:03 Finished running 1 seed in 0 hours 0 minutes and 4.26 seconds (4.26s). 14:38:03 14:38:03 Completed with 1 error and 0 warnings: 14:38:03 14:38:03 Database Error in seed MySeed (seeds/MySeed.csv) GlueCreateCsvFailed 14:38:03 14:38:03 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
I find in the Athena documentation (sorry not in Glue documentation) Database, table, and column name requirements
Acceptable characters for database names, table names, and column names in AWS Glue must be a UTF-8 string and should be in lower case.
To solve this issue I added lower function in the where statement ("tableName == lower('MySeed')").
("tableName == lower('MySeed')")
dbt seed 14:44:46 Running with dbt=1.8.6 14:44:46 Registered adapter: glue=1.8.1 14:44:47 Found 1 seed, 879 macros 14:44:47 14:44:48 Concurrency: 1 threads (target='dev-glue') 14:44:48 14:44:48 1 of 1 START seed file dbt.MySeed .............................................. [RUN] 14:44:53 1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 5.74s] 14:44:53 14:44:53 Finished running 1 seed in 0 hours 0 minutes and 6.51 seconds (6.51s). 14:44:53 14:44:53 Completed successfully 14:44:53 14:44:53 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 ❯ dbt seed 15:02:45 Running with dbt=1.8.6 15:02:45 Registered adapter: glue=1.8.1 15:02:46 Found 1 seed, 879 macros 15:02:46 15:03:04 Concurrency: 1 threads (target='dev-glue') 15:03:04 15:03:04 1 of 1 START seed file dbt.MySeed .............................................. [RUN] 15:04:38 1 of 1 OK loaded seed file dbt.MySeed .......................................... [CREATE 1 in 93.38s] 15:04:38 15:04:38 Finished running 1 seed in 0 hours 1 minutes and 51.96 seconds (111.96s). 15:04:38 15:04:38 Completed successfully 15:04:38 15:04:38 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
CHANGELOG.md
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Nice, working on my side. @moomindani could you please take a look when you have some time ? thanks!
Conflict on CHANGELOG.md fixed.
Thank you for your contribution :)
resolves #429
Description
To reproduce the Bug create a seed file named
MySeed.csv
in your seed folder with the following content :Run 2 times :
dbt seed
The first time the table did not exist in glue metastore, so the command pass.
The second time the command did not pass
I find in the Athena documentation (sorry not in Glue documentation) Database, table, and column name requirements
To solve this issue I added lower function in the where statement
("tableName == lower('MySeed')")
.Checklist
CHANGELOG.md
and added information about my change to the "dbt-glue next" section.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.