Closed dsaxton closed 2 years ago
Using dbt-athena from commit 192136a5eb069dcc5b80aff6fc81d45f06a7bf89 we get an error running dbt seed with the following seed data:
dbt-athena
dbt seed
dsaxton:~/git-repos/dbt-scratch/scratch$ dbt seed 23:20:58 Running with dbt=1.1.0 23:20:58 Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics 23:20:58 23:21:04 Concurrency: 1 threads (target='dev') 23:21:04 23:21:04 1 of 1 START seed file taxi_data.trip_counts ................................... [RUN] 23:22:27 1 of 1 ERROR loading seed file taxi_data.trip_counts ........................... [ERROR in 83.08s] 23:22:27 23:22:27 Finished running 1 seed in 88.47s. 23:22:27 23:22:27 Completed with 1 error and 0 warnings: 23:22:27 23:22:27 Runtime Error in seed trip_counts (seeds/trip_counts.csv) 23:22:27 FAILED: ParseException line 2:67 cannot recognize input near 'integer' ')' 'stored' in column type 23:22:27 23:22:27 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
dsaxton:~/git-repos/dbt-scratch/scratch$ cat seeds/trip_counts.csv date,trip_count 2020-01-01,1 2021-01-01,2
Renaming column date to trip_date causes it to work:
date
trip_date
dsaxton:~/git-repos/dbt-scratch/scratch$ dbt seed 23:23:39 Running with dbt=1.1.0 23:23:39 Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics 23:23:39 23:23:45 Concurrency: 1 threads (target='dev') 23:23:45 23:23:45 1 of 1 START seed file taxi_data.trip_counts ................................... [RUN] 23:23:49 1 of 1 OK loaded seed file taxi_data.trip_counts ............................... [INSERT 2 in 4.00s] 23:23:49 23:23:49 Finished running 1 seed in 10.17s. 23:23:49 23:23:49 Completed successfully 23:23:49 23:23:49 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
dsaxton:~/git-repos/dbt-scratch/scratch$ cat seeds/trip_counts.csv trip_date,trip_count 2020-01-01,1 2021-01-01,2
Using different data for the second column but still using date as the first column name we get a new error message:
dsaxton:~/git-repos/dbt-scratch/scratch$ dbt seed 23:26:51 Running with dbt=1.1.0 23:26:51 Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics 23:26:51 23:26:57 Concurrency: 1 threads (target='dev') 23:26:57 23:26:57 1 of 1 START seed file taxi_data.trip_counts ................................... [RUN] 23:28:24 1 of 1 ERROR loading seed file taxi_data.trip_counts ........................... [ERROR in 87.06s] 23:28:24 23:28:24 Finished running 1 seed in 92.63s. 23:28:24 23:28:24 Completed with 1 error and 0 warnings: 23:28:24 23:28:24 Runtime Error in seed trip_counts (seeds/trip_counts.csv) 23:28:24 FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.UnsupportedOperationException: Parquet does not support date. See HIVE-6384 23:28:24 23:28:24 Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1
dsaxton:~/git-repos/dbt-scratch/scratch$ cat seeds/trip_counts.csv date,value 2020-01-01,hello 2021-01-01,world
Again renaming date to trip_date fixes the error:
dsaxton:~/git-repos/dbt-scratch/scratch$ dbt seed 23:29:16 Running with dbt=1.1.0 23:29:16 Found 2 models, 4 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics 23:29:16 23:29:22 Concurrency: 1 threads (target='dev') 23:29:22 23:29:22 1 of 1 START seed file taxi_data.trip_counts ................................... [RUN] 23:29:26 1 of 1 OK loaded seed file taxi_data.trip_counts ............................... [INSERT 2 in 4.02s] 23:29:26 23:29:26 Finished running 1 seed in 9.55s. 23:29:26 23:29:26 Completed successfully 23:29:26 23:29:26 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
dsaxton:~/git-repos/dbt-scratch/scratch$ cat seeds/trip_counts.csv trip_date,value 2020-01-01,hello 2021-01-01,world
Using
dbt-athena
from commit 192136a5eb069dcc5b80aff6fc81d45f06a7bf89 we get an error runningdbt seed
with the following seed data:Renaming column
date
totrip_date
causes it to work:Using different data for the second column but still using
date
as the first column name we get a new error message:Again renaming
date
totrip_date
fixes the error: