aws / aws-sdk-pandas

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
https://aws-sdk-pandas.readthedocs.io
Apache License 2.0
3.93k stars 701 forks source link

Connecting to a redshift cluster that runs in an other aws account #2986

Open matteoredaelli opened 1 month ago

matteoredaelli commented 1 month ago

Hello

I'd like to implement a lambda in aws account A that connects to a Redshift cluster in aws account B using aws wrangler

import awswrangler as wr
con_redshift = wr.redshift.connect(secret_id="MYSECRET")`

The code can retrieve the secret (it is stored in aws account A) but then it fails with

botocore.errorfactory.ClusterNotFoundFault: An error occurred (ClusterNotFound) when calling the DescribeClusters operation: Cluster my-redshift not found.

This is why redshift is in an other aws account. ;-)

I can reproduce it from command line, setting access and secret key of account A at operating system level.

The secret key MYSECRET is like

{
  "host": "myhost",
  "port": "5439",
  "database": "dbname",
  "user-name": "myuser",
  "password": "dummy",
  "engine": "redshift",
  "dbClusterIdentifier": "clustername"
}

is it possible?

Thanks in advance Matteo

jaidisido commented 1 month ago

The simplest path is to breakdown the calls in two.

First get the secret using the boto3 session of Account A:

value = wr.secretsmanager.get_secret_json(name="my-secret", boto3_session="account-A-boto3-session")

Then use the secret details from the previous step to create a redshift connector connection to the cluster in Account B:

import redshift_connector

con_redshift = redshift_connector.connect(user=..., database=...)

You can then pass that connection to any subsequent redshift call you would be making but using the boto3_session of account B:

df = wr.redshift.read_sql_query(
    sql="SELECT * FROM public.my_table",
    con=con_redshift,
    boto3_session="account-B-boto3-session",
)

Alternatively you can share data cross accounts with Redshift but that might be more involved.