aws / amazon-redshift-python-driver

Redshift Python Connector. It supports Python Database API Specification v2.0.
Apache License 2.0
202 stars 72 forks source link

Can't connect with assumed role. #192

Closed ryan6310 closed 6 months ago

ryan6310 commented 8 months ago

Driver version

2.0.915

Client Operating System

AWS Linux

Python version

3.10.12

Problem description

I am trying to connect to redshift from an ec2 instance in a different account from the cluster, but the same region. I have set up the VPC peering and the needed permission. Largely following this guide. The boto3 based code below to connect and query works well. I have been unable to connect using the redshift connector. How does one connect with an assumed role when using the redshift_connector?

import boto3
import json
ROLE_ARN = "arn:aws:iam::XXX"
SESSION_NAME = "XXX"
CLUSTER_IDENTIFIER = "XXX"
DB_USER = "XXX"
DB_NAME = "XXX"

# Create an STS client
sts_client = boto3.client('sts')

# Assume the role and save the credentials
assumed_role_object = sts_client.assume_role(
    RoleArn=ROLE_ARN,
    RoleSessionName=SESSION_NAME
)

credentials = assumed_role_object['Credentials']

# Create a Redshift client using the temporary credentials
redshift_data_client = boto3.client(
    'redshift-data',
    aws_access_key_id=credentials['AccessKeyId'],
    aws_secret_access_key=credentials['SecretAccessKey'],
    aws_session_token=credentials['SessionToken'],
)

# Execute a SQL command
sql_statement = "SELECT * FROM TEST_TABLE LIMIT 5"

response = redshift_data_client.execute_statement(
    ClusterIdentifier=CLUSTER_IDENTIFIER,
    Database=DB_NAME,
    DbUser=DB_USER,
    Sql=sql_statement
)

query_id = response['Id']

# Wait for the statement to complete
statement_finished = False
while not statement_finished:
    status = redshift_data_client.describe_statement(Id=query_id)
    query_status = status['Status']

    if query_status in ['FINISHED', 'FAILED', 'ABORTED']:
        statement_finished = True
    else:
        pass

# Once the query is done, retrieve the result
if query_status == 'FINISHED':
    result = redshift_data_client.get_statement_result(Id=query_id)
    print("Query Result:", json.dumps(result))
else:
    print(f"Query {query_status}")

I cannot get redshift_connector to work. This code leads to the error below.

conn = redshift_connector.connect(
    iam=True,
    db_user=DB_USER,
    database=DB_NAME,
    cluster_identifier=CLUSTER_IDENTIFIER,
    access_key_id=credentials['AccessKeyId'],
    secret_access_key=credentials['SecretAccessKey'],
    session_token=credentials['SessionToken'],
    region="us-west-2",
 )

Python Driver trace logs

Traceback (most recent call last):
  File "/opt/conda/envs/pytorch/lib/python3.10/site-packages/redshift_connector/core.py", line 634, in __init__
    self._usock.connect(hostport)
TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/ec2-user/redshift/query_redshift.py", line 138, in <module>
    redshift_connector.connect(
  File "/opt/conda/envs/pytorch/lib/python3.10/site-packages/redshift_connector/__init__.py", line 396, in connect
    return Connection(
  File "/opt/conda/envs/pytorch/lib/python3.10/site-packages/redshift_connector/core.py", line 690, in __init__
    raise OperationalError("connection time out", timeout_error)
redshift_connector.error.OperationalError: ('connection time out', TimeoutError(110, 'Connection timed out'))
Brooke-white commented 8 months ago

Hi @ryan6310 ,

thanks for reaching out. looking at the trace, it appears the issue is hit when attempting to establish a socket to the redshift cluster. we attempt to connect to the redshift cluster after preforming authentication steps.

In your case, the authentication steps would be taking the AWS keys passed to connect(), and using them with the boto3 redshift client to 1) get temporary IAM credentials for connecting to Redshift 2) get cluster endpoint and port

I'd recommend the following 1) enable debug logging for this scenario 2) scan the debug log to look for a redshift endpoint. confirm the endpoint is as expected. 3) e-mail the debug log to redshift-drivers[at]amazon.com, mentioning this issue and I will take a look at the logs.

Brooke-white commented 6 months ago

please re-open if further assistance is needed