dbt-labs / dbt-presto

[ARCHIVED] The Presto adapter plugin for dbt Core
http://getdbt.com/
Apache License 2.0
33 stars 30 forks source link

DBT-presto Glue support #41

Open oleksandrkovalenko opened 3 years ago

oleksandrkovalenko commented 3 years ago

I have a hive DBT project configured to use hive via presto. We are using AWS EMR and AWS Glue Catalogue. I have added recommended configuration for presto

hive.metastore-cache-ttl=0s hive.metastore-refresh-interval = 5s hive.allow-drop-table=true hive.allow-rename-table=true

When I'm running dbt run I'm getting PrestoUserError(type=USER_ERROR, name=NOT_SUPPORTED, message="Table rename is not yet supported by Glue service")

Is there a way to configure dbt or dbt-presto to run different queries instead of renaming tables?

jtcohen6 commented 3 years ago

@oleksandrkovalenko This is really interesting. It is possible to reimplement/override the basic materializations, by copying-pasting-editing from dbt-presto into your own project. In particular, this is the offending bit of logic:

https://github.com/fishtown-analytics/dbt-presto/blob/81efb93ac809cd4874713825833cf63c6350e94e/dbt/include/presto/macros/materializations/table.sql#L41-L43

There's a larger question here that goes beyond table rename. We've always known that Presto's functionality varies tremendously based on the connector being used: transactions, atomic DML, metadata availability, etc. How should we think about structuring a dbt plugin for Presto, given the functional variance? Does it make sense to have a wide array of plugins, each for use with a different flavor of Presto/Trino/etc?

friendofasquid commented 3 years ago

We have had success reimplementing the table materialisation in our project:

{% materialization table, adapter='presto' -%}  {%- set identifier = model['alias'] -%}
  {%- set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) -%}  {%- set target_relation = api.Relation.create(identifier=identifier,                                                schema=schema,                                                database=database,                                                type='table') -%}
  {{ run_hooks(pre_hooks) }}
  {%- if old_relation is not none -%}      {{ adapter.drop_relation(old_relation) }}  {%- endif -%}
  -- build model  {% call statement('main') -%}    {{ create_table_as(False, target_relation, sql) }}  {% endcall -%}
  {{ run_hooks(post_hooks) }}
  {% do persist_docs(target_relation, model) %}
  {{ return({'relations': [target_relation]}) }}
{%- endmaterialization -%}

IIRC, we took this from the dbt-athena connector. Works fine, except for the downtime. We'll be looking to fix that with some view gymnastics in the next month or so.