Tomme / dbt-athena

The athena adapter plugin for dbt (https://getdbt.com)
Apache License 2.0
142 stars 79 forks source link

Frequent nondeterministic errors in dbt run and dbt seed when using multiple threads #41

Closed sacundim closed 2 years ago

sacundim commented 2 years ago

Using multiple threads to rebuild models is a huge performance win under Athena because it just scales up like crazy, but when I use it with this adapter I randomly get strange, nondescript errors that say things like credential_provider that sound like they're component or variable names in the AWS client library:

06:35:55 | 6 of 52 ERROR creating table model covid19_puerto_rico_model.hhs_hospitals [ERROR in 0.53s]

[further down]

Completed with 1 error and 0 warnings:

Runtime Error in model hhs_hospitals (models/hhs/staging/hhs_hospitals.sql)
  'credential_provider'

Workarounds

  1. Don't use multithreading (which can be a lot slower)
  2. Rerun just the affected models (dbt run --threads 1 --select hhs_hospitals+)
Dandandan commented 2 years ago

How much threads do you use? Are you sure you don't get throttled by AWS?

sacundim commented 2 years ago

I'm using 5 threads. The service quotas for Athena are at least 20 simultaneous queries:

Even if you exceed the quota, what I understand is that what Athena would do is place your queries into a queue.

Looking at the adapter's code, it sounds to me like throttling should give a different exception, which the adapter will catch and retry:

                retry_config=RetryConfig(
                    attempt=creds.num_retries,
                    exceptions=(
                        "ThrottlingException",
                        "TooManyRequestsException",
                        "InternalServerException",
                    ),

Googling a bit I find this this boto3 issue with a similar error I also see sometimes (endpoint_resolver), where in the comments, dbt project members say that boto3 client creation is not thread safe.

Looking at DBT issues, this one mentions that their intent is to have one connection object per thread. Superficially it seems to me that the Athena adapter is abiding by that, but I'm new to this codebase so maybe I'm missing something. The connection objects being used here, if I understand, are from PyAthena, which adds another layer of complexity—the interaction whereby the boto3 client objects would be created spans three codebases that I'm unfamiliar with.

Tomme commented 2 years ago

@Dandandan Maybe relating, I have noticed on query failures, such as SYNTAX_ERROR this is triggering query retries when ideally it should not be, likely down to retry=retry_if_exception(lambda _: True),

Dandandan commented 2 years ago

@Dandandan Maybe relating, I have noticed on query failures, such as SYNTAX_ERROR this is triggering query retries when ideally it should not be, likely down to retry=retry_if_exception(lambda _: True),

Yeah this was something recently noticed too. I guess we have to specify the specific exceptions here as well.

Dandandan commented 2 years ago

But I think it's not related to this issue.

sacundim commented 2 years ago

I tried the fix with 20 threads at a time and I didn't see any errors now. Thanks folks!