Closed stephanetrou closed 2 months ago
resolves #430
To reproduce the bug create a seed file `myseed.csv' with the following content :
adapter,version glue,1.8.1 spanner,
Launch dbt seed and you'll get this output
dbt seed
❯ dbt seed 16:35:37 Running with dbt=1.8.6 16:35:37 Registered adapter: glue=1.8.1 16:35:38 Found 1 seed, 879 macros 16:35:38 16:35:38 Concurrency: 1 threads (target='dev-glue') 16:35:38 16:35:38 1 of 1 START seed file dbt.myseed .............................................. [RUN] 16:36:38 Glue adapter: Glue returned `error` for statement None for code csv = [{"adapter": "glue", "version": "1.8.1"}, {"adapter": "spanner", "version": null}] df = spark.createDataFrame(csv) table_name = 'dbt.myseed' if (spark.sql("show tables in dbt").where("tableName == 'myseed'").count() > 0): df.write .mode("overwrite") .format("parquet") .insertInto(table_name, overwrite=True) else: df.write .option("path", "s3://dbt-glue-test-xxxxxxxxx/simple-test/dbt/myseed") .format("parquet") .saveAsTable(table_name) SqlWrapper2.execute("""select * from dbt.myseed limit 1""") , NameError: name 'null' is not defined 16:36:38 1 of 1 ERROR loading seed file dbt.myseed ...................................... [ERROR in 60.28s] 16:36:39 16:36:39 Finished running 1 seed in 0 hours 1 minutes and 1.01 seconds (61.01s). 16:36:39 16:36:39 Completed with 1 error and 0 warnings: 16:36:39 16:36:39 Database Error in seed myseed (seeds/myseed.csv) GlueCreateCsvFailed 16:36:39 16:36:39 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
As the jausanca write the problem come from json serialization.
The fix use json module to convert the json string to Python dict. Here an exemple of code produce with the fix applied.
❯ dbt seed 16:46:34 Running with dbt=1.8.6 16:46:34 Registered adapter: glue=1.8.1 16:46:35 Found 1 seed, 879 macros 16:46:35 16:46:35 Concurrency: 1 threads (target='dev-glue') 16:46:35 16:46:35 1 of 1 START seed file dbt.myseed .............................................. [RUN] custom_glue_code_for_dbt_adapter csv = [{'adapter': 'glue', 'version': '1.8.1'}, {'adapter': 'spanner', 'version': None}] df = spark.createDataFrame(csv) table_name = 'dbt.myseed' if (spark.sql("show tables in dbt").where("tableName == 'myseed'").count() > 0): df.write .mode("overwrite") .format("parquet") .insertInto(table_name, overwrite=True) else: df.write .option("path", "s3://dbt-glue-test-xxxxxxxxx/simple-test/dbt/myseed") .format("parquet") .saveAsTable(table_name) SqlWrapper2.execute("""select * from dbt.myseed limit 1""") 16:46:40 1 of 1 OK loaded seed file dbt.myseed .......................................... [CREATE 2 in 4.70s] 16:46:40 16:46:40 Finished running 1 seed in 0 hours 0 minutes and 5.40 seconds (5.40s). 16:46:40 16:46:40 Completed successfully 16:46:40 16:46:40 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
CHANGELOG.md
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Nice, working on my side. @moomindani could you please take a look when you have some time ? thanks!
Thank you for your contribution!
resolves #430
Description
To reproduce the bug create a seed file `myseed.csv' with the following content :
Launch
dbt seed
and you'll get this outputAs the jausanca write the problem come from json serialization.
The fix use json module to convert the json string to Python dict. Here an exemple of code produce with the fix applied.
Checklist
CHANGELOG.md
and added information about my change to the "dbt-glue next" section.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.