aws / amazon-redshift-python-driver

Redshift Python Connector. It supports Python Database API Specification v2.0.
Apache License 2.0
203 stars 75 forks source link

Cann't connect to redshift from scipt running from sagemaker #190

Closed guilhermeparreira closed 11 months ago

guilhermeparreira commented 1 year ago

Driver version

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.54052

Client Operating System

Python version

3.9.12

Table schema

CREATE TABLE dm_spd.demand_planning_previsao ( date date ENCODE az64, pred bigint ENCODE az64, forecast character varying(256) ENCODE lzo, forecast_collaboration double precision ENCODE raw, market character varying(256) ENCODE lzo, model character varying(256) ENCODE lzo, cycle date ENCODE az64, exog_variables character varying(256) ENCODE lzo, category character varying(256) ENCODE lzo, subgroup character varying(256) ENCODE lzo, solution character varying(256) ENCODE lzo, best_model character varying(256) ENCODE lzo, order_model bigint ENCODE az64, last_load_update timestamp without time zone ENCODE az64 ) DISTSTYLE AUTO;

Problem description

I can connect to redshift from my machine or AWS glue using the following code:

import redshift_connector
conn = redshift_connector.connect(
    host=XXX,
    database=XXX,
    user=XXX,
    password=XXX,
    )
conn = redshift_connector.connect(
    host=XXX,
    database=XXX,
    user=XXX,
    password=XXX,
    )

However, my code that is in production works a bit differently: I start a DAG in Airflow, that creates a processing job in Amazon Sagemaker that contains this script.

  1. Expected behaviour: To connect.
  2. Actual behaviour: It does not connect.
  3. Error message/stack trace: image
  4. Any other details that can be helpful: My Redshift in production is protected by VPC. So, we tried to set the VPC when we created the processing job. This code currently does not work, but I want to know if I am on the right track
from sagemaker.workflow.steps import ProcessingStep
from sagemaker.workflow.vpc_config import VpcConfig
from sagemaker.workflow.properties import PropertyFile
from sagemaker.workflow.properties import PropertyType
machine = "ml.c5.4xlarge"
region = 'us-east-1'

vpc_id = 'vpc-0692c88c3a53bdf34'
subnet_ids = ['us-east-1c']
security_group_ids = ['default']

# # Create a VPC configuration object
vpc_config = VpcConfig(security_group_ids=security_group_ids,  # Replace with your security group IDs
                       subnets=subnet_ids)
name = 'demand-planning-' + task
ProcessingStep(name=name,          
                              processor=processor,  # Replace with your processor configuration            
                              inputs=[...],  # Define your processing inputs            
                              outputs=[...],  # Define your processing outputs            
                              job_arguments=[...],  # Define your job arguments
                              depends_on=depends_on,
                              property_files=[PropertyFile(name="VpcConfig",
                                                            output_name="vpcConfig",
                                                            property_type=PropertyType.VpcConfig,
                                                            value=vpc_config)]
                              )

Do you have any clue/best practices to connect Sagemaker to redshift?

Brooke-white commented 1 year ago

Hi @guilhermeparreira

Thanks for reaching out. I understand you're seeing issues connecting to Redshift using username/password authentication.

I've found some recommendations on approaching this issue on this AWS re:post. There are different options for resolution based on if your Redshift instance is public or private. My guess would be to take a look at how Sagemaker is configured with VPC as an issue may lie there. For the best guidance in securely setting up your VPC + connections between Sagemaker and Redshift I'd recommend to reach our to AWS Support if you have any hesitancy.

Brooke-white commented 11 months ago

closing out due to inactivity. please re-open if you are still experiencing this issue.