Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
140 stars 79 forks source link

Support large DBs for dbt docs #80

Closed JustasCe closed 2 years ago

JustasCe commented 2 years ago

fixes https://github.com/Tomme/dbt-athena/issues/79

Overview

Currently the athena__get_catalog macro query hangs indefinitely when trying to generate docs for a database with more than 100 tables, these changes fix the issue by specifying which tables to fetch per each database and batches to a maximum of 100 tables.

What's changed

It overrides the default BaseAdapter._get_catalog_schemas() function with a custom AthenaAdapter._get_catalog_schemas(), which instead of using SchemaSearchMap now uses a custom AthenaSchemaSearchMap. The SchemaSearchMap.add() only returns a dictionary with values being a set of database names, the new AthenaSchemaSearchMap.add() returns a dictionary of dictionaries, where each dictionary key is a database name and the value is a set of the tables in the database.

Using the new AthenaSchemaSearchMap the athena__get_catalog macro now batches the queries to do a maximum select of 100 tables per database per each union.

JustasCe commented 2 years ago

Opened a new PR that we can support https://github.com/Tomme/dbt-athena/pull/86 based on comments