aws / amazon-redshift-python-driver

Redshift Python Connector. It supports Python Database API Specification v2.0.
Apache License 2.0
204 stars 76 forks source link

`redshift-connector` giving connection time-outs on Codebuild. #212

Closed gecaro closed 5 months ago

gecaro commented 9 months ago

Driver version

2.0.918

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.63282

Client Operating System

Docker container: Debian GNU/Linux 11 (bullseye) on python3.9 docker image Codebuild: Using aws/codebuild/standard:7.0

Python version

python3.9

Table schema

Problem description

  1. Expected behaviour: Codebuild machine connecting correctly to redshift.

  2. Actual behaviour: Codebuild machine not connecting to redshift.

  3. Error message/stack trace:

    Traceback (most recent call last):
    --
    417 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 626, in __init__
    418 | self._usock.connect(hostport)
    419 | TimeoutError: [Errno 110] Connection timed out
    420 |  
    421 | During handling of the above exception, another exception occurred:
    422 |  
    423 | Traceback (most recent call last):
    424 | File "/analytics-dbt/test_redshift_connector.py", line 29, in <module>
    425 | with redshift_connector.connect(
    426 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/__init__.py", line 376, in connect
    427 | return Connection(
    428 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 689, in __init__
    429 | raise InterfaceError("communication error", e)
    430 | redshift_connector.error.InterfaceError: ('communication error', TimeoutError(110, 'Connection timed out'))
  4. Any other details that can be helpful:

Things we've tried:

It is also important to note that we have no problem when running any of the above approaches in our local machines.

Python Driver trace logs

Reproduction code

import redshift_connector
import os
import time

schema = os.environ.get("DBT_TARGET_SCHEMA")
# Establish a connection to the database
query = f"""
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike '{schema}'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike '{schema}'
"""
with redshift_connector.connect(
    host="<host>",
    database="dbt_ci",
    # user=os.environ.get("DBT_PROFILE_USER"),
    user="dbt_ci",
    password=os.environ.get("DBT_PROFILE_PASSWORD"),
    timeout=999999,
) as conn:
    # Create a new cursor
    with conn.cursor() as cursor:
        start_time = time.time()
        # Execute the SQL query
        cursor.execute(query)
        rows = cursor.fetchall()
        for row in rows:
            print(row)

This does not work, giving the following error:

Traceback (most recent call last):
--
417 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 626, in __init__
418 | self._usock.connect(hostport)
419 | TimeoutError: [Errno 110] Connection timed out
420 |  
421 | During handling of the above exception, another exception occurred:
422 |  
423 | Traceback (most recent call last):
424 | File "/analytics-dbt/test_redshift_connector.py", line 29, in <module>
425 | with redshift_connector.connect(
426 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/__init__.py", line 376, in connect
427 | return Connection(
428 | File "/usr/local/lib/python3.9/site-packages/redshift_connector/core.py", line 689, in __init__
429 | raise InterfaceError("communication error", e)
430 | redshift_connector.error.InterfaceError: ('communication error', TimeoutError(110, 'Connection timed out'))

Whereas the following snippet works:

import psycopg2
import os

schema = os.environ.get("DBT_TARGET_SCHEMA")
query = f"""
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike '{schema}'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike '{schema}'
"""

# Establish a connection to the database
conn = psycopg2.connect(
    dbname="dbt_ci",
    host="<host>",
    port="5439",
    user="dbt_ci",
    password=os.environ.get("DBT_PROFILE_PASSWORD"),
)

# Create a cursor object
cur = conn.cursor()

print(query)
# Execute a query
cur.execute(query)

# Fetch all the rows
rows = cur.fetchall()

for row in rows:
    print(row)

# Close the cursor and connection
cur.close()
conn.close()
FridayPush commented 8 months ago

Issue is also seen from various other members of the DBT community using Redshift; since they migrated to use redshift_connector over pyscopg2. Queries that consistently time out locally from me, run fine in SQL workbench type tooling (say 15minute query runs) and never time out.

https://github.com/dbt-labs/dbt-redshift/issues/701

Brooke-white commented 8 months ago

Hi folks, thanks for the issue report. I took a look at the associated DBT issue and see it has been closed out as a suspected network issue with Codebuild. Regardless, I understand that folks were not seeing this issue with psycopg2 and see it with redshift-connector when using the driver alone with Codebuild. What's weird to me is that issues are seen with psql and Codebuild. Is the timeout issue consistent for psql? I see the redshift-connector timeout issue is noted as intermittent but primarily timing out.

Given the repro above is using username/password, redshift-connector is essentially just creating a socket to the host and port provided. As the timeout is occurring on connection, and it only presents on codebuild, I'm inclined to believe this issue is related to the networking settings configured on the CodeBuild machine.

Here's a list of possible things to try that can help us rule out potential issues:

  1. disable ssl
  2. confirm codebuild network timeout settings

If you don't see resolution, please let me know and the team can investigate further.

FridayPush commented 8 months ago

Sorry to add confusion with the DBT issue; it was closed before most of the comments were added. Various sources display issues like like Github Actions, Circle CI, but the majority of people commenting in the dbt slack redshift channel about time outs are using local machines to their clusters.

Some generic observations:

gecaro commented 8 months ago

Hello all, thanks a lot for your comments. We have been checking our network settings and we have managed to run successfully the simple redshift-connector script mentioned above, as well as the psql -h <host> -p 5439 -U <user> -d <db> statement. However, we are still encountering issues while running dbt compile and, in some builds, dbt debug. Having the debug flag in the compile command returns the following:

16:44:08  Acquiring new redshift connection 'master'
--
646 | 16:44:08  Acquiring new redshift connection 'list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f'
647 | 16:44:08  Using redshift connection "list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f"
648 | 16:44:08  On list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f: BEGIN
649 | 16:44:08  Opening a new connection, currently in state init
650 | 16:44:08  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'require'.To connect without ssl, set `sslmode` to 'disable'.
651 | 16:44:08  Redshift adapter: Connecting to redshift with username/password based auth...
652 | 16:46:18  Redshift adapter: Error running SQL: BEGIN
653 | 16:46:18  Redshift adapter: Rolling back transaction.
654 | 16:46:18  Redshift adapter: Error running SQL: macro list_relations_without_caching
655 | 16:46:18  Redshift adapter: Rolling back transaction.
656 | 16:46:18  On list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f: No close available on handle
657 | 16:46:18  Connection 'master' was properly closed.
658 | 16:46:18  Connection 'list_***_***_1711039169_500120b4b6f46f3cc96c3062bee86d05b12a887f' was properly closed.
659 | 16:46:18  Encountered an error:
660 | Database Error
661 | ('communication error', TimeoutError(110, 'Connection timed out'))

This failing macro is basically the same sql statement that we are running in our script. At this point, I am assuming dbt has something to do in relation to how the connection is being handled using redshift-connector. Possibly the best thing now would be to open an issue directly in dbt, but if you know how we may debug further from this point on, I'd be really grateful.

adri commented 2 months ago

@Brooke-white We're also seeing timeouts when running DBT in GitHub Actions. We initially thought the keep-alive setting is the issue, but it seems to be enabled by default. However, we don't see any setting as to how often a keep-alive package is sent. Do you have any more info on this? Maybe the interval is higher than the session timeout. Just a thought