Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
142 stars 79 forks source link

Support specifying compression type for Parquet and ORC models #31

Open aut0clave opened 2 years ago

aut0clave commented 2 years ago

It would be helpful if dbt-athena supported specifying the parquet_compression and orc_compression properties for models.

By default, Athena will use GZIP compression for Parquet and ORC tables, but supports several other compression formats (docs). Generally speaking, SNAPPY is faster to read/write, but GZIP yields better compression ratios.

It might also be worth exploring using SNAPPY as the default compression format for Parquet in dbt-athena.

aut0clave commented 2 years ago

Or, better yet, simply support the new-as-of-yesterday write_compression parameter that works for all output types. Release note here: https://docs.aws.amazon.com/athena/latest/ug/release-note-2021-09-16.html

There also exists documentation for which formats support which compression types, which might complicate the implementation here.

owenprough-sift commented 1 year ago

@Tomme can you mark this issue as closed by #53?