`redshift-connector` giving connection time-outs on Codebuild.

Driver version

2.0.918

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.63282

Client Operating System

Docker container: Debian GNU/Linux 11 (bullseye) on python3.9 docker image Codebuild: Using aws/codebuild/standard:7.0

Python version

python3.9

Table schema

Problem description

Expected behaviour: Codebuild machine connecting correctly to redshift.
Actual behaviour: Codebuild machine not connecting to redshift.

Error message/stack trace:

Traceback (most recent call last):
--
417 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 626, in __init__
418 | self._usock.connect(hostport)
419 | TimeoutError: [Errno 110] Connection timed out
420 |  
421 | During handling of the above exception, another exception occurred:
422 |  
423 | Traceback (most recent call last):
424 | File "/analytics-dbt/test_redshift_connector.py", line 29, in <module>
425 | with redshift_connector.connect(
426 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/__init__.py", line 376, in connect
427 | return Connection(
428 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 689, in __init__
429 | raise InterfaceError("communication error", e)
430 | redshift_connector.error.InterfaceError: ('communication error', TimeoutError(110, 'Connection timed out'))

Any other details that can be helpful:

We have been using dbt, which is a framework for data modeling. For now we are using dbt 1.4.5 which uses psycopg2.
We use codebuild as our CI and so far, when running dbt commands such as dbt compile or dbt run we had no problems on connecting to redshift with our codebuild environment.
We have tried to upgrade to dbt multiple times, from versions 1.5 to 1.7.8 (the last one) which start to use redshift-connector, and we have not been successful as we are getting time outs multiple times.
IMPORTANT: Sometimes we are able to connect, but most of the time we aren't.

Things we've tried:

Using a custom redshift-connector script to do a simple query (to discard that the issue is caused by dbt itself) -> Same result: connection time out
Using a custom psycopg2 script to do a simple query (to discard that the issue is caused by dbt itself) -> THIS WORKS!
Running psql -h <host> -p 5439 -U <user> -d <db> -> Same result: connection time out

It is also important to note that we have no problem when running any of the above approaches in our local machines.

Python Driver trace logs

Reproduction code

import redshift_connector
import os
import time

schema = os.environ.get("DBT_TARGET_SCHEMA")
# Establish a connection to the database
query = f"""
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike '{schema}'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike '{schema}'
"""
with redshift_connector.connect(
    host="<host>",
    database="dbt_ci",
    # user=os.environ.get("DBT_PROFILE_USER"),
    user="dbt_ci",
    password=os.environ.get("DBT_PROFILE_PASSWORD"),
    timeout=999999,
) as conn:
    # Create a new cursor
    with conn.cursor() as cursor:
        start_time = time.time()
        # Execute the SQL query
        cursor.execute(query)
        rows = cursor.fetchall()
        for row in rows:
            print(row)

This does not work, giving the following error:

Traceback (most recent call last):
--
417 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 626, in __init__
418 | self._usock.connect(hostport)
419 | TimeoutError: [Errno 110] Connection timed out
420 |  
421 | During handling of the above exception, another exception occurred:
422 |  
423 | Traceback (most recent call last):
424 | File "/analytics-dbt/test_redshift_connector.py", line 29, in <module>
425 | with redshift_connector.connect(
426 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/__init__.py", line 376, in connect
427 | return Connection(
428 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 689, in __init__
429 | raise InterfaceError("communication error", e)
430 | redshift_connector.error.InterfaceError: ('communication error', TimeoutError(110, 'Connection timed out'))

Whereas the following snippet works:

import psycopg2
import os

schema = os.environ.get("DBT_TARGET_SCHEMA")
query = f"""
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike '{schema}'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike '{schema}'
"""

# Establish a connection to the database
conn = psycopg2.connect(
    dbname="dbt_ci",
    host="<host>",
    port="5439",
    user="dbt_ci",
    password=os.environ.get("DBT_PROFILE_PASSWORD"),
)

# Create a cursor object
cur = conn.cursor()

print(query)
# Execute a query
cur.execute(query)

# Fetch all the rows
rows = cur.fetchall()

for row in rows:
    print(row)

# Close the cursor and connection
cur.close()
conn.close()

Issue is also seen from various other members of the DBT community using Redshift; since they migrated to use redshift_connector over pyscopg2. Queries that consistently time out locally from me, run fine in SQL workbench type tooling (say 15minute query runs) and never time out.

https://github.com/dbt-labs/dbt-redshift/issues/701

Hi folks, thanks for the issue report. I took a look at the associated DBT issue and see it has been closed out as a suspected network issue with Codebuild. Regardless, I understand that folks were not seeing this issue with psycopg2 and see it with redshift-connector when using the driver alone with Codebuild. What's weird to me is that issues are seen with psql and Codebuild. Is the timeout issue consistent for psql? I see the redshift-connector timeout issue is noted as intermittent but primarily timing out.

Given the repro above is using username/password, redshift-connector is essentially just creating a socket to the host and port provided. As the timeout is occurring on connection, and it only presents on codebuild, I'm inclined to believe this issue is related to the networking settings configured on the CodeBuild machine.

Here's a list of possible things to try that can help us rule out potential issues:

disable ssl
confirm codebuild network timeout settings

If you don't see resolution, please let me know and the team can investigate further.

Sorry to add confusion with the DBT issue; it was closed before most of the comments were added. Various sources display issues like like Github Actions, Circle CI, but the majority of people commenting in the dbt slack redshift channel about time outs are using local machines to their clusters.

Some generic observations:

DBT's Psycopg2 version would see this issue too occasionally however we could set the keepalives_idle value which would resolve the issue. The keepalives_idle flag was the most commonly referred solution in slack when others would experience timeouts. That flag is now deprecated.
Some networking environments experience the issue a lot less. For instance DBT Cloud seems to be nearly flawless in tcp timeout. We've migrated from executing our dbt jobs locally to using the dbt cli which runs the connections from their environment; and now rarely see issues.

Hello all, thanks a lot for your comments. We have been checking our network settings and we have managed to run successfully the simple redshift-connector script mentioned above, as well as the psql -h <host> -p 5439 -U <user> -d <db> statement. However, we are still encountering issues while running dbt compile and, in some builds, dbt debug. Having the debug flag in the compile command returns the following:

16:44:08  Acquiring new redshift connection 'master'
--
646 | 16:44:08  Acquiring new redshift connection 'list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f'
647 | 16:44:08  Using redshift connection "list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f"
648 | 16:44:08  On list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f: BEGIN
649 | 16:44:08  Opening a new connection, currently in state init
650 | 16:44:08  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'require'.To connect without ssl, set `sslmode` to 'disable'.
651 | 16:44:08  Redshift adapter: Connecting to redshift with username/password based auth...
652 | 16:46:18  Redshift adapter: Error running SQL: BEGIN
653 | 16:46:18  Redshift adapter: Rolling back transaction.
654 | 16:46:18  Redshift adapter: Error running SQL: macro list_relations_without_caching
655 | 16:46:18  Redshift adapter: Rolling back transaction.
656 | 16:46:18  On list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f: No close available on handle
657 | 16:46:18  Connection 'master' was properly closed.
658 | 16:46:18  Connection 'list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f' was properly closed.
659 | 16:46:18  Encountered an error:
660 | Database Error
661 | ('communication error', TimeoutError(110, 'Connection timed out'))

This failing macro is basically the same sql statement that we are running in our script. At this point, I am assuming dbt has something to do in relation to how the connection is being handled using redshift-connector. Possibly the best thing now would be to open an issue directly in dbt, but if you know how we may debug further from this point on, I'd be really grateful.

@Brooke-white We're also seeing timeouts when running DBT in GitHub Actions. We initially thought the keep-alive setting is the issue, but it seems to be enabled by default. However, we don't see any setting as to how often a keep-alive package is sent. Do you have any more info on this? Maybe the interval is higher than the session timeout. Just a thought

aws / amazon-redshift-python-driver