Dandandan / dbt-athena

dbt adapter for Athena
Apache License 2.0
39 stars 9 forks source link

fix `dbt docs generate` for athena execution engines 1 & 2. #37

Closed haleemur closed 1 year ago

haleemur commented 3 years ago

execution engine 2 is being rolled out in us-east-1 (and a few other regions) currently. soon, it will become the default engine athena uses worldwide.

there are some sql-parsing incompatibilities between the two versions when tables are joined like this

from table1
join table2 using (col1, col2)

in version 1, it was necessary to specify col1 as either table1.col1 or table2.col1 in the where clause or select clause. however in version 2, that generates an error

to ensure compatibility across both versions, the simplest thing to do is to avoid specifying the join as using (col1, col2)

this pattern is currently used in fetching the database metadata when executing dbt docs generate, so I was not able to build docs against athena running on execution engine version 2.

with the change, dbt docs generate works successfully across both engine versions.

limitation with athena execution engine version 1

information.colunms does not seem to contain information about views. This seems to have been fixed in engine v2

stumelius commented 3 years ago

Hi @haleemur ,

I tried your updated catalog script with both dbt 0.17.2 and 0.18.1 and managed to successfully generate the docs. However, the docs are missing columns that are not specified in schema.yml files. And the columns that are defined in the schema files are missing column types if the model is a view. For materialized models the column types are shown correctly. There's probably something wrong with combining the catalog data with the data from schema.yml files. Do you happen to know anything about this?

EDIT: I checked the generated manifest.json and it only contained my seed nodes. Further investigation revealed that

select
*
from information_schema.columns

does not contain any of my view models. The views and their columns are available in the Glue console though, so the metadata exists but not in information_schema.columns...

haleemur commented 3 years ago

@smomni yes, i noticed that as well & had made a note of it at the end of my pull request's message

limitation with athena execution engine version 1 information.columns does not seem to contain information about views. This seems to have been fixed in engine v2

this PR does not attempt to fix that issue as it is a bug with athena, which is resolved in v2. I had considered looking up the glue catalog directly, but that adds additional complexity, the need for which will go away as aws rolls out the newer engine across more regions.

thank you for testing that this branch works with dbt versions 0.17.2 & 0.18.1. that has been sitting on the back of my mind for a while.