Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
140 stars 79 forks source link

Query isn't properly formatted when table is partitioned #77

Open Batto1300 opened 2 years ago

Batto1300 commented 2 years ago

Hi, I'm trying to run

{{ config(partitioned_by='day') }}
select 
    a,
    b,
    c
 from random_table

I can see that in target/run the query is being compiled to

create table
    dev_datawarehouse.test

    with (
        partitioned_by=ARRAY'day',
        format='parquet'
    )
  as

select 
    a,
    b,
    c
 from random_table

Which gives the following error: SYNTAX_ERROR: Unknown type: ARRAY. the compiled query should instead be

create table
    dev_datawarehouse.test

    with (
        partitioned_by=ARRAY['day'],
        format='parquet'
    )
  as

select 
    a,
    b,
    c
 from random_table

Which works just fine. Am I missing something obvious?

Batto1300 commented 2 years ago

Ok I've solved the problem: this is wrong {{ config(partitioned_by='day') }} should be {{ config(partitioned_by=['day']) }} what is less obvious is that if you declare the partition column at the project level like so:

example:
      partitioned_by:
          - day

you'll experience the same error. You should declare it like so:

example:
      partitioned_by: [day]

This StackOverflow answer saying that the 2 forms are equivalent threw me off a bit https://stackoverflow.com/questions/23657086/yaml-multi-line-arrays

aut0clave commented 2 years ago

To partition by a date column, use syntax like this:

{{ config( partitioned_by = [ 'some_date_column' ] ) }}

select a, b, some_date_column
  from random_table