aws / amazon-redshift-python-driver

Redshift Python Connector. It supports Python Database API Specification v2.0.
Apache License 2.0
204 stars 76 forks source link

ArrayContentNotSupportedError #214

Closed stefanks closed 4 months ago

stefanks commented 8 months ago

Driver version

2.1.0

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.63269

Client Operating System

macOS 14.2.1 (23C71)

Python version

3.12

Table schema

any

Problem description

  1. Expected behaviour: cursor.execute('SELECT * from svv_user_info where user_id in %(p)s', {"p": ('100','101') }) runs without errors
  2. Actual behaviour: cursor.execute('SELECT * from svv_user_info where user_id in %(p)s', {"p": ('100','101') }) errors
  3. Error message/stack trace: redshift_connector.error.ArrayContentNotSupportedError: oid 25 not supported as array contents
  4. Any other details that can be helpful: cursor.execute('SELECT * from svv_user_info where user_id = %(p)s', {"p": '100' }) runs with no problems.
stefanks commented 6 months ago

@Brooke-white I need help with the pull request, can you look at it? Thanks!

Brooke-white commented 4 months ago

Hi @stefanks ,

My apologies for my delay in response on this issue.

Redshift server does not support bind parameters for array based types. My assumption is that this is due to Redshift not fully supporting array types outside of the catalog.

If this behavior worked as expected with a different database driver, my assumption is that the other database driver was not truly using bind parameters -- meaning the bind parameters were embedded in the statement before it was sent to the database.

As such, redshift-connector cannot currently support using arrays as bind parameters.

As bind parameters are the most secure way to pass parameters to a database, I've provided a work around that will allow you to leverage bind parameters despite the limitation with bind parameters for array based types.

While this is not the cleanest way to solve this problem, I hope it can help somewhat.

my_params = [1,2,3]
my_query = "select * from my_table where foo in ({})"
my_query = myquery.format(','.join(["%s"] * len(my_params))

cursor.execute(my_query, my_params)

In parallel, I will discuss this feature request with the Redshift team and provide updates in this issue as I can.