dbt-msft / dbt-sqlserver

dbt adapter for SQL Server and Azure SQL
MIT License
212 stars 99 forks source link

Query retry mechanism not working as expected for intermittent network issues #507

Open ka-weihe opened 3 months ago

ka-weihe commented 3 months ago

Current Behavior

The documentation states:

The number of automatic times to retry a query before failing. Defaults to 1. Queries with syntax errors will not be retried. This setting can be used to overcome intermittent network issues.

(Source: https://docs.getdbt.com/docs/core/connect-data-platform/mssql-setup#authentication-methods--profile-configuration)

However, upon examining the code, it appears that retries are only attempted for the initial connection. If a connection is successfully established but encounters an error after a period of time (e.g., 10 minutes), the query is not retried.

(Source: https://github.com/dbt-msft/dbt-sqlserver/blob/f789ab0815b926bd68af6e901cd0e33b2895db3f/dbt/adapters/sqlserver/sql_server_connection_manager.py#L152)

Issue

We are experiencing intermittent errors that do not trigger the retry mechanism:

Database Error in model <redacted>
11:31:30    ('08S01', '[08S01] [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x274C (10060) (SQLExecDirectW)')

This error code (0x274C / 10060) suggests a connection timeout, but the retry option does not seem to address this issue.

Expected Behavior

The retry mechanism should work for both initial connections and subsequent query executions to handle intermittent network issues throughout the entire session.

Questions

  1. Is this behavior intentional or a potential bug in the retry implementation?
  2. Are there any workarounds or configuration options to handle these types of intermittent connection issues?
  3. Is there a plan to extend the retry mechanism to cover query execution failures as well as initial connection failures?

Additional Information