Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
140 stars 79 forks source link

Support specifying compression type #53

Closed ignacioreyna closed 2 years ago

ignacioreyna commented 2 years ago

based of issue #31

Changes in this PR:

Antauri commented 2 years ago

Hello,

Can this be merged?

Dandandan commented 2 years ago

@ignacioreyna Thanks, this look great (also wanted to try out zstd compression myself) . Could you fix the conflict?

Antauri commented 2 years ago

Waiting on the magic merge :+1: :100:

ignacioreyna commented 2 years ago

Done!

Antauri commented 2 years ago

@Dandandan @Tomme can someone merge this? (and others) ... I've been manually picking branches from here and committing them to my own fork in order to get some nice features in :) ... would be better to just use the upstream instead.

Dandandan commented 2 years ago

Thanks @ignacioreyna @Antauri !

Dandandan commented 2 years ago

@Antauri We are a bit limited in time to maintain this repository. If you or someone else wants to step in, feel free to let us know, @Tomme is willing to add you as maintainer.

Quoting the message of @Tomme here:

Firstly, I have had to change my dbt Slack account, so apologies if I have missed any of your recent messages. Secondly, As you may have noticed over the past six months I have been less responsive regarding the upkeep and maintenance of the dbt-athena adapter, this was largely due to me simply using Athena less and less in my day to day work. This is set to change, I am back working with Athena regularly now and hope to implement timely improvements to the adapter and incorporate the pending improvement people have kindly submitted! If anyone is interested in being added to the project as a maintainer please do just reach out, any help is appreciated. If there is any open Pull Requests people would like me to prioritise please point me to them, Note I am open to transferring ownership of the project if it can be more community driven and/or vendor supported, e.g. dbt Labs or Amazon Web Services - feel free to drop me a message if interested. And lastly, a huge thank you to all contributors, whether it be issues or pull requests - I really did not anticipate the amount of use this adapter would get! In brief, I hope to improve maintenance for the dbt-athena adapter! (edited)

Antauri commented 2 years ago

Sure! When you can, if you'd like, please add me as a maintainer/reviewer. We're using Athena (with Dagster/dbt) for the foreseeable future, which should be helpful in testing some of the pull requests/contributions in a real eco-system.

peralmq commented 2 years ago

I'm getting an error after this change. When I do a dbt run ... I get:

06:43:35  Runtime Error in model <omitted> (models/<omitted>.sql)
06:43:35    HIVE_UNSUPPORTED_FORMAT: Unknown ORC compression type . You may need to manually clean the data at location 's3://<omitted>/tables/74c18181-e106-4d3e-8750-87f45371f5a4' before retrying. Athena will not delete data in your account.

If I remove the line related to the write_compression in https://github.com/ignacioreyna/dbt-athena/blob/a77bfbe18889f05c34cddee2df194159c8f004ab/dbt/include/athena/macros/materializations/models/table/create_table_as.sql#L28-L30 everything is back to working.

I'm wondering (but I have very little knowledge of the dbt-athena code) if this has to do with that https://github.com/Tomme/dbt-athena/pull/53 doesn't add the initial set variable line that is done for other variables, i.e. the format variable {%- set format = config.get('format', default='parquet') -%} has https://github.com/ignacioreyna/dbt-athena/blob/a77bfbe18889f05c34cddee2df194159c8f004ab/dbt/include/athena/macros/materializations/models/table/create_table_as.sql#L7

Maybe you should add {%- set write_compression = config.get('write_compression', default='snappy') -%} or {%- set write_compression = config.get('write_compression', default=none) -%}?

peralmq commented 2 years ago

The {%- set write_compression = config.get('write_compression', default=none) -%} was part of the original commit https://github.com/Tomme/dbt-athena/pull/53/commits/4893009b4d9fcd2b27b5571b785dd83269967e5b#diff-c741e161a85cd7b35c19884cff9f6790dd483de64ac687cf4c44e1947d478fa1R15 and must have been lost in a later one. So, I'm pretty sure that should go into the master branch.

Btw, can https://github.com/Tomme/dbt-athena/issues/31 be closed now that this is merged?

Gatsby-Lee commented 2 years ago

hi, yep. I have the same issue. I investigated the same issue and got to the same point.

Gatsby-Lee commented 2 years ago

@Dandandan I am also actively using DBT + DBT-Athena + Dagster. If you can, plz promote me to maintainer/reviewer.

Thank you

juhoautio-rovio commented 2 years ago

Also encountered the issue about write_compression, ref. https://github.com/Tomme/dbt-athena/pull/53#issuecomment-1080268443.

Seems like there's no open issue about it yet, no PR either? Fix seems to be simple. Is anyone planning to make that fix?