While adopting dbt-athena we noticed that for some of our databases dbt docs generate hangs indefinitely.
After some investigation we found that it happens when using the athena__get_catalog macro for a db with more than 100 tables. This is not a dbt-athena issue itself, rather an AWS Athena internal issues. We ended up talking with AWS support and they mentioned that an issue most likely related to this is something they are working on, but have no timeline for when it's going to be fixed.
In the mean time the workaround that we were advised to use is to also filter by tables. After some manual testing we found that splitting the catalog query to filter by databases AND tables (in batches of 100) solves the issue. Sample of the query:
select *
from (
(
with tables as (
...
),
columns as (
...
),
catalog as (
...
)
select *
from catalog
where lower("table_schema") = lower('very_large_database')
and (
"table_name" = lower('table_1')
or "table_name" = lower('table_2')
....
or "table_name" = lower('table_100')
)
union all
select *
from catalog
where lower("table_schema") = lower('very_large_database')
and (
"table_name" = lower('table_101')
or "table_name" = lower('table_102')
....
or "table_name" = lower('table_200')
)
.....
)
These tables can be retrieved from the manifest as part of the adapter get_catalog().
While adopting dbt-athena we noticed that for some of our databases
dbt docs generate
hangs indefinitely.After some investigation we found that it happens when using the
athena__get_catalog
macro for a db with more than 100 tables. This is not a dbt-athena issue itself, rather an AWS Athena internal issues. We ended up talking with AWS support and they mentioned that an issue most likely related to this is something they are working on, but have no timeline for when it's going to be fixed.In the mean time the workaround that we were advised to use is to also filter by tables. After some manual testing we found that splitting the catalog query to filter by databases AND tables (in batches of 100) solves the issue. Sample of the query:
These tables can be retrieved from the manifest as part of the adapter
get_catalog()
.