Open dom-devel opened 1 year ago
Thanks for reaching out @dom-devel !
80% is really high 😰
Unfortunately, I think you're right that this could be a hard one for us to reproduce.
An idea for you:
dbt-bigquery==1.4.0
This version has a new way to handle connection retries for BigQuery (#230)
Are you in a position you could try upgrade to dbt-bigquery==1.4.0
and see if it changes anything for you?
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
I have the similar problem with dbt for bigquery:
From time to time the run even dbt debug takes 7 minutes to check where 99% of time it checks the connection.
Could you share your environment details @SergeyMatskevich ?
Example:
OS: Ubuntu 20.04
Python: 3.9.12 (via python3 --version
)
dbt-core: 1.5.1 (via dbt --version
)
dbt-bigquery: 1.5.3 (via dbt --version
)
Do you notice anything that is different when it takes a long time? For example, do you see anything in your logs/dbt.log
file that is different between when it is "slow" and when it is "fast"?
Is it possible you are experiencing an unstable internet connection when it is "slow"?
Hello,
Thank you for going back to me!
My setup is:
OS: Mac Os 14.1.1 (23B81) Python: 3.8.3 dbt-core: 1.7.3 dbt-bigquery: 1.7.2
So I've run the dbt build
with -d to see where the issue is. Here are two examples for yesterday's and today's run.
It is fast when the time between these two lines less then 10 seconds:
[0m17:20:08.682442 [debug] [ThreadPool]: Opening a new connection, currently in state init [0m17:20:10.212892 [debug] [ThreadPool]: Re-using an available connection from the pool (some connections data here)
It is slow when the time between these two lines is around 7 minutes (and it is always 7 minutes):
[0m09:39:51.663813 [debug] [ThreadPool]: Opening a new connection, currently in state init [0m09:47:25.522677 [debug] [ThreadPool]: Re-using an available connection from the pool (some connections data here)
@SergeyMatskevich It would be helpful if we can figure out what code dbt is executing during this 7 minutes. Could you try to induce a connection that you expect to take 7 minutes and then interrupt it with control + c?
dbt has some retry logic in place, so it's possible you will need to do multiple control + c to get it to fully exit. Then could you search your logs for a Python stacktrace and share the entire stack trace?
It's interesting that it is always taking about 7 minutes when it is slow. There are 454 seconds between 9:39:51 and 9:47:25. That rounds down to 450 seconds (7.5 minutes). That's the same as 90 seconds 5 times or 150 seconds 3 times.
So maybe there's a single setting of 450 somewhere, or 90 or 150 multiplied by a handful of retries.
The way dbt-bigquery allows for configuration of timeouts and retries is described here.
@dbeatty10 hello!
Here it what I found as stack trace after I aborted execution with control
+ c
[0m20:41:09.285666 [error] [MainThread]: Traceback (most recent call last):
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/requires.py", line 90, in wrapper
result, success = func(*args, **kwargs)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/requires.py", line 75, in wrapper
return func(*args, **kwargs)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/requires.py", line 168, in wrapper
return func(*args, **kwargs)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/requires.py", line 197, in wrapper
return func(*args, **kwargs)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/requires.py", line 244, in wrapper
return func(*args, **kwargs)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/requires.py", line 284, in wrapper
return func(*args, **kwargs)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/main.py", line 625, in run
results = task.run()
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/task/runnable.py", line 474, in run
result = self.execute_with_hooks(selected_uids)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/task/runnable.py", line 434, in execute_with_hooks
self.before_run(adapter, selected_uids)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/task/run.py", line 446, in before_run
self.create_schemas(adapter, required_schemas)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/task/runnable.py", line 572, in create_schemas
for ls_future in as_completed(list_futures):
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/concurrent/futures/_base.py", line 244, in as_completed
waiter.event.wait(wait_timeout)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/threading.py", line 558, in wait
signaled = self._cond.wait(timeout)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/threading.py", line 302, in wait
waiter.acquire()
Thanks for this stacktrace @SergeyMatskevich !
The part that sticks out to me is:
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/task/run.py", line 446, in before_run
self.create_schemas(adapter, required_schemas)
I'm assuming you got this when doing either a dbt run
or a dbt build
?
If so, is there any chance you could do it with a dbt debug
as well to see if the stack trace looks the same or different?
@dbeatty10 Sure
Here is a stacktrace after dbt debug
:
[0m07:52:22.663609 [error] [MainThread]: Traceback (most recent call last):
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/requires.py", line 90, in wrapper
result, success = func(*args, **kwargs)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/requires.py", line 75, in wrapper
return func(*args, **kwargs)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/cli/main.py", line 444, in debug
results = task.run()
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/task/debug.py", line 153, in run
connection_status = self.test_connection()
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/task/debug.py", line 472, in test_connection
connection_result = self.attempt_connection(self.profile)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/task/debug.py", line 450, in attempt_connection
adapter.debug_query()
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/bigquery/impl.py", line 922, in debug_query
self.execute("select 1 as id")
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/base/impl.py", line 310, in execute
return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 514, in execute
query_job, iterator = self.raw_execute(sql, limit=limit)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 467, in raw_execute
client = conn.handle
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/contracts/connection.py", line 90, in handle
self._handle.resolve(self)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/contracts/connection.py", line 114, in resolve
return self.opener(connection)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 400, in open
handle = cls.get_bigquery_client(connection.credentials)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/google/api_core/retry.py", line 349, in retry_wrapped_func
return retry_target(
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/google/api_core/retry.py", line 191, in retry_target
return target()
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 381, in get_bigquery_client
creds = cls.get_credentials(profile_credentials)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 376, in get_credentials
return cls.get_google_credentials(profile_credentials)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 338, in get_google_credentials
credentials, _ = get_bigquery_defaults(scopes=profile_credentials.scopes)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/dbt/adapters/bigquery/connections.py", line 85, in get_bigquery_defaults
credentials, _ = google.auth.default(scopes=scopes)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/google/auth/_default.py", line 657, in default
credentials, project_id = checker()
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/google/auth/_default.py", line 651, in <lambda>
lambda: _get_gcloud_sdk_credentials(quota_project_id=quota_project_id),
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/google/auth/_default.py", line 242, in _get_gcloud_sdk_credentials
project_id = _cloud_sdk.get_project_id()
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/google/auth/_cloud_sdk.py", line 106, in get_project_id
project = _run_subprocess_ignore_stderr(
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/site-packages/google/auth/_cloud_sdk.py", line 88, in _run_subprocess_ignore_stderr
output = subprocess.check_output(command, stderr=devnull)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/subprocess.py", line 411, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/subprocess.py", line 491, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "/Users/sergey_matskevich/miniconda3/lib/python3.8/subprocess.py", line 1011, in communicate
stdout = self.stdout.read()
KeyboardInterrupt
@SergeyMatskevich bummer: I didn't notice anything in common between those stack traces :(
Going to label this as help_wanted
in case someone that is experiencing this is able to propose a fix.
Enterprise customer ran into this on 1.6:
2024-10-16 23:16:40.920613 (MainThread): 23:16:40 Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b7568b650>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b75c2f590>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b7568b510>]}
2024-10-16 23:16:40.925769 (MainThread): 23:16:40 Running with dbt=1.6.18
2024-10-16 23:16:40.926337 (MainThread): 23:16:40 running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'debug': 'True', 'fail_fast': 'False', 'log_path': '/tmp/jobs/2984146/target/logs', 'version_check': 'True', 'profiles_dir': '/tmp/jobs/2984146/.dbt', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'json', 'introspect': 'True', 'invocation_command': 'dbt --log-format json --debug run --select <REDACTED> --target default --profile user --profiles-dir /tmp/jobs/2984146/.dbt --project-dir /tmp/jobs/2984146/target', 'static_parser': 'True', 'target_path': 'None', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'send_anonymous_usage_stats': 'True'}
2024-10-16 23:16:42.204271 (MainThread): 23:16:42 Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '5deb62f3-729f-4ea7-b62d-35ce510d5751', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b639c1e90>]}
2024-10-16 23:16:42.215702 (MainThread): 23:16:42 Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '5deb62f3-729f-4ea7-b62d-35ce510d5751', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b62845910>]}
2024-10-16 23:16:42.216565 (MainThread): 23:16:42 Registered adapter: bigquery=1.6.13
2024-10-16 23:16:42.231637 (MainThread): 23:16:42 checksum: 481e05787e7519421182df46accf3090beaa0a57cf7211ead4be3015b53392c8, vars: {}, profile: user, target: default, version: 1.6.18
2024-10-16 23:16:42.987418 (MainThread): 23:16:42 Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
2024-10-16 23:16:42.988105 (MainThread): 23:16:42 Partial parsing enabled, no changes found, skipping parsing
2024-10-16 23:16:42.999181 (MainThread): 23:16:42 Sending event: {'category': 'dbt', 'action': 'plugin_get_nodes', 'label': '5deb62f3-729f-4ea7-b62d-35ce510d5751', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b5d224050>]}
2024-10-16 23:16:43.214886 (MainThread): 23:16:43 Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '5deb62f3-729f-4ea7-b62d-35ce510d5751', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b6092acd0>]}
2024-10-16 23:16:44.155836 (MainThread): 23:16:44 Sending event: {'category': 'dbt', 'action': 'resource_counts', 'label': '5deb62f3-729f-4ea7-b62d-35ce510d5751', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b60247490>]}
2024-10-16 23:16:44.156606 (MainThread): 23:16:44 Found 1833 models, 106 tests, 3577 sources, 0 exposures, 0 metrics, 649 macros, 0 groups, 0 semantic models
2024-10-16 23:16:44.157482 (MainThread): 23:16:44 Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '5deb62f3-729f-4ea7-b62d-35ce510d5751', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b60234810>]}
2024-10-16 23:16:44.192444 (MainThread): 23:16:44
2024-10-16 23:16:44.193392 (MainThread): 23:16:44 Acquiring new bigquery connection 'master'
2024-10-16 23:16:44.195034 (ThreadPoolExecutor-0_0): 23:16:44 Acquiring new bigquery connection 'list_<REDACTED>'
2024-10-16 23:16:44.195667 (ThreadPoolExecutor-0_0): 23:16:44 Opening a new connection, currently in state init
2024-10-16 23:16:51.325922 (ThreadPoolExecutor-1_0): 23:16:51 Re-using an available connection from the pool (formerly list_<REDACTED>, now list_<REDACTED>)
2024-10-16 23:16:51.326993 (ThreadPoolExecutor-1_1): 23:16:51 Acquiring new bigquery connection 'list_<REDACTED>'
2024-10-16 23:16:51.327973 (ThreadPoolExecutor-1_2): 23:16:51 Acquiring new bigquery connection 'list_<REDACTED>'
2024-10-16 23:16:51.328601 (ThreadPoolExecutor-1_0): 23:16:51 Opening a new connection, currently in state closed
2024-10-16 23:16:51.329286 (ThreadPoolExecutor-1_3): 23:16:51 Acquiring new bigquery connection 'list_<REDACTED>_BR_REFERENCEDB'
2024-10-16 23:16:51.329922 (ThreadPoolExecutor-1_1): 23:16:51 Opening a new connection, currently in state init
2024-10-16 23:16:51.330590 (ThreadPoolExecutor-1_2): 23:16:51 Opening a new connection, currently in state init
2024-10-16 23:16:51.370350 (ThreadPoolExecutor-1_3): 23:16:51 Opening a new connection, currently in state init
2024-10-16 23:16:51.739158 (ThreadPoolExecutor-1_2): 23:16:51 Re-using an available connection from the pool (formerly list_<REDACTED>, now list_<REDACTED>)
2024-10-16 23:16:51.760126 (ThreadPoolExecutor-1_2): 23:16:51 Opening a new connection, currently in state closed
2024-10-16 23:16:51.923865 (ThreadPoolExecutor-1_3): 23:16:51 Re-using an available connection from the pool (formerly list_<REDACTED>_BR_REFERENCEDB, now list_<REDACTED>)
2024-10-16 23:16:51.945046 (ThreadPoolExecutor-1_3): 23:16:51 Opening a new connection, currently in state closed
2024-10-16 23:16:53.665508 (MainThread): 23:16:53 Sending event: {'category': 'dbt', 'action': 'runnable_timing', 'label': '5deb62f3-729f-4ea7-b62d-35ce510d5751', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7f9b5fd6bad0>]}
2024-10-16 23:16:53.666798 (MainThread): 23:16:53
2024-10-16 23:16:53.666798 (MainThread): 23:16:53 Concurrency: 4 threads (target='default')
2024-10-16 23:16:53.667275 (MainThread): 23:16:53
2024-10-16 23:16:53.674258 (Thread-1 (worker)): 23:16:53 Began running node model.<REDACTED>
2024-10-16 23:16:53.674953 (Thread-1 (worker)): 23:16:53 1 of 1 START sql incremental model <REDACTED> [RUN]
2024-10-16 23:16:53.675759 (Thread-1 (worker)): 23:16:53 Re-using an available connection from the pool (formerly <REDACTED>, now <REDACTED>)
2024-10-16 23:16:53.676275 (Thread-1 (worker)): 23:16:53 Began compiling node model.<REDACTED>
2024-10-16 23:16:53.686170 (Thread-1 (worker)): 23:16:53 Writing injected SQL for node "model.<REDACTED>"
2024-10-16 23:16:53.687106 (Thread-1 (worker)): 23:16:53 Timing info for model.<REDACTED> (compile): 23:16:53.676689 => 23:16:53.686920
2024-10-16 23:16:53.687680 (Thread-1 (worker)): 23:16:53 Began executing node model.<REDACTED>
2024-10-16 23:16:53.721153 (Thread-1 (worker)): 23:16:53 Opening a new connection, currently in state closed
dbt command interrupted2024-10-17 02:58:09.782248 (MainThread): 02:58:09 [33mThe bigquery adapter does not support query cancellation. Some queries may still be running![0m
^ The run stalled attempting to connect and went on for hours until the user cancelled the run. I'm wondering if we should set some sort of timeout on the connection attempt?
Is this a new bug in dbt-core?
Current Behavior
I have a relatively small DBT project ~ 23 models.
When trying to run
dbt run
or primarily a small segment of the projectdbt run --models analytics_dashboard --full-refresh
The connection to BigQuery hangs and never progresses probably ~80% of the time. (Although this varies from day to day).
When running the compiled query in BigQuery it takes approximately 5-10 seconds to run.
There are never any failure logs or anything other than just the silent failure of a run most of the time.
Best guess
I'd guess the connection is somehow silently dropping and maybe dbt doesn't try to reconnect it? If I toggle my WiFi network on and off mid-run, it will realise the disconnect, attempt to reconnect, succeed, run a query and then drop again.
Expected Behavior
It shouldn't hang it should instead attempt to reconnect. (Or provide additional logging output).
Steps To Reproduce
Environment:
Generate a basic DBT project with some queries which run on BigQuery.
But it's not a consistent problem, so I suspect it's going to be hard to re-create :(
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
bigquery
Additional Context
No response