Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
142 stars 79 forks source link

BUG: dbt seed fails with CSV having column named "date" #101

Closed dsaxton closed 2 years ago

dsaxton commented 2 years ago

Using dbt-athena from commit 192136a5eb069dcc5b80aff6fc81d45f06a7bf89 we get an error running dbt seed with the following seed data:

dsaxton:~/git-repos/dbt-scratch/scratch$ dbt seed
23:20:58  Running with dbt=1.1.0
23:20:58  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
23:20:58  
23:21:04  Concurrency: 1 threads (target='dev')
23:21:04  
23:21:04  1 of 1 START seed file taxi_data.trip_counts ................................... [RUN]
23:22:27  1 of 1 ERROR loading seed file taxi_data.trip_counts ........................... [ERROR in 83.08s]
23:22:27  
23:22:27  Finished running 1 seed in 88.47s.
23:22:27  
23:22:27  Completed with 1 error and 0 warnings:
23:22:27  
23:22:27  Runtime Error in seed trip_counts (seeds/trip_counts.csv)
23:22:27    FAILED: ParseException line 2:67 cannot recognize input near 'integer' ')' 'stored' in column type
23:22:27  
23:22:27  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
dsaxton:~/git-repos/dbt-scratch/scratch$ cat seeds/trip_counts.csv 
date,trip_count
2020-01-01,1
2021-01-01,2

Renaming column date to trip_date causes it to work:

dsaxton:~/git-repos/dbt-scratch/scratch$ dbt seed
23:23:39  Running with dbt=1.1.0
23:23:39  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
23:23:39  
23:23:45  Concurrency: 1 threads (target='dev')
23:23:45  
23:23:45  1 of 1 START seed file taxi_data.trip_counts ................................... [RUN]
23:23:49  1 of 1 OK loaded seed file taxi_data.trip_counts ............................... [INSERT 2 in 4.00s]
23:23:49  
23:23:49  Finished running 1 seed in 10.17s.
23:23:49  
23:23:49  Completed successfully
23:23:49  
23:23:49  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
dsaxton:~/git-repos/dbt-scratch/scratch$ cat seeds/trip_counts.csv 
trip_date,trip_count
2020-01-01,1
2021-01-01,2

Using different data for the second column but still using date as the first column name we get a new error message:

dsaxton:~/git-repos/dbt-scratch/scratch$ dbt seed
23:26:51  Running with dbt=1.1.0
23:26:51  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
23:26:51  
23:26:57  Concurrency: 1 threads (target='dev')
23:26:57  
23:26:57  1 of 1 START seed file taxi_data.trip_counts ................................... [RUN]
23:28:24  1 of 1 ERROR loading seed file taxi_data.trip_counts ........................... [ERROR in 87.06s]
23:28:24  
23:28:24  Finished running 1 seed in 92.63s.
23:28:24  
23:28:24  Completed with 1 error and 0 warnings:
23:28:24  
23:28:24  Runtime Error in seed trip_counts (seeds/trip_counts.csv)
23:28:24    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.UnsupportedOperationException: Parquet does not support date. See HIVE-6384
23:28:24  
23:28:24  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
dsaxton:~/git-repos/dbt-scratch/scratch$ cat seeds/trip_counts.csv 
date,value
2020-01-01,hello
2021-01-01,world

Again renaming date to trip_date fixes the error:

dsaxton:~/git-repos/dbt-scratch/scratch$ dbt seed
23:29:16  Running with dbt=1.1.0
23:29:16  Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
23:29:16  
23:29:22  Concurrency: 1 threads (target='dev')
23:29:22  
23:29:22  1 of 1 START seed file taxi_data.trip_counts ................................... [RUN]
23:29:26  1 of 1 OK loaded seed file taxi_data.trip_counts ............................... [INSERT 2 in 4.02s]
23:29:26  
23:29:26  Finished running 1 seed in 9.55s.
23:29:26  
23:29:26  Completed successfully
23:29:26  
23:29:26  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
dsaxton:~/git-repos/dbt-scratch/scratch$ cat seeds/trip_counts.csv 
trip_date,value
2020-01-01,hello
2021-01-01,world