Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
142 stars 79 forks source link

Support large DBs by requesting only tables defined in manifest #86

Closed JustasCe closed 2 years ago

JustasCe commented 2 years ago

fixes https://github.com/Tomme/dbt-athena/issues/79

This PR addresses the same issue and makes the same changes as https://github.com/Tomme/dbt-athena/pull/80 with the comments also addressed. This PR is from a new fork because we can maintain it.

Overview

Currently the athena__get_catalog macro query hangs indefinitely when trying to generate docs for a database with more than 100 tables, these changes fix the issue by specifying which tables to fetch per each database and batches to a maximum of 100 tables.

What's changed

It overrides the default BaseAdapter._get_catalog_schemas() function with a custom AthenaAdapter._get_catalog_schemas(), which instead of using SchemaSearchMap now uses a custom AthenaSchemaSearchMap. The SchemaSearchMap.add() only returns a dictionary with values being a set of database names, the new AthenaSchemaSearchMap.add() returns a dictionary of dictionaries, where each dictionary key is a database name and the value is a set of the tables in the database. _get_one_catalog is also updated with the correct typing.

Using the new AthenaSchemaSearchMap the athena__get_catalog macro now batches the queries to do a maximum select of 100 tables per database per each union.

mrshu commented 2 years ago

@Tomme would you be up for having something like this in dbt-athena? The situation around large amounts of tables is really painful in practice.

Tomme commented 2 years ago

Loving this implementation for the issue at hand! All tested in my environment(s) and happy for it to get merged in 👍