Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
140 stars 79 forks source link

Retrieve relation information through Glue API directory #74

Closed roy-ht closed 8 months ago

roy-ht commented 2 years ago

Problem

Our Glue database is huge and often failed to retrieve schema information.

Athena returns this type of error:

GENERIC_INTERNAL_ERROR: java.lang.RuntimeException: java.lang.InterruptedException: sleep interrupted

More specifically, 2 macros often cause an error, or query is too slow:

Solution

Add override methods:

And in these methods, Call glue_get_table and glue_get_tables and get its column information directory.

Effect

owenprough-sift commented 2 years ago

Is this PR obsoleted by recently-merged #86?

roy-ht commented 1 year ago

@owenprough-sift

Is this PR obsoleted by recently-merged #86?

No, I tried master branch that #86 was merged, it still causes an java.lang.RuntimeException.

roy-ht commented 1 year ago

So users may want to choose if using Glue API or not, i'll add a configuration option like use_glue_api: bool.

roy-ht commented 1 year ago

@VDFaller

88a868a adds the use_glue_api option

Enriqson commented 1 year ago

I also had a slowness issue with list_relations_without_caching, it worked fine for schemas up to 100 tables but was painfully slow for 101+. Solved it by updating the Athena Engine to version 3. I've tested schemas with up to 296 tables and they all perform quite well. Guess it might help you as well @roy-ht