aws / amazon-redshift-python-driver

Redshift Python Connector. It supports Python Database API Specification v2.0.
Apache License 2.0
202 stars 72 forks source link

struct.error: unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 #176

Closed bradleybonitatibus closed 6 months ago

bradleybonitatibus commented 11 months ago

Driver version

redshift-connector==2.0.909

Redshift version

PostgreSQL 8.0.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.2 20041017 (Red Hat 3.4.2-6.fc3), Redshift 1.0.52943 Using Redshift Serverless

Client Operating System

Python version

python3.10

Table schema

Problem description

  1. Expected behaviour: query should drop table

  2. Actual behaviour: exception is raised

  3. Error message/stack trace:

    Traceback (most recent call last):
    File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/operators/redshift_sql.py", line 82, in execute
    hook.run(self.sql, autocommit=self.autocommit, parameters=self.parameters)
    File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/common/sql/hooks/sql.py", line 288, in run
    with closing(self.get_conn()) as conn:
    File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/hooks/redshift_sql.py", line 127, in get_conn
    return redshift_connector.connect(**conn_kwargs)
    File "/home/airflow/.local/lib/python3.10/site-packages/redshift_connector/__init__.py", line 340, in connect
    return Connection(
    File "/home/airflow/.local/lib/python3.10/site-packages/redshift_connector/core.py", line 707, in __init__
    code, data_len = ci_unpack(self._read(5))
    struct.error: unpack_from requires a buffer of at least 5 bytes for unpacking 5 bytes at offset 0 (actual buffer size is 0)
  4. Any other details that can be helpful: Called through RedshiftSQLOperator in apache-airflow==2.4.3

Python Driver trace logs

Reproduction code

from airflow.models import DAG
from airflow.providers.amazon.aws.operators.redshift_sql import RedshiftSQLOperator

with DAG(...) as dag:
    drop_if_exists = RedshiftSQLOperator(
       task_id="...",
       sql="DROP TABLE IF EXISTS {{ params.schema }}.{{ params.table }}",
       params={"schema": "...", "table": "..."},
    )
...
Brooke-white commented 11 months ago

Hi @bradleybonitatibus , does this issue occur consistently or only sometimes?

bradleybonitatibus commented 11 months ago

Hey @Brooke-white! Very infrequently. Raised this issue as it impacted one of our production serverless clusters.

Brooke-white commented 11 months ago

Thanks for that information, @bradleybonitatibus. There are two cases I'm aware of where this error presents. My guess is you are hitting case 1 since you're using Redshift serverless and this issue presents very infrequently.

  1. If "cold", redshift serverless can take some time to be ready to accept connections and queries. Retry logic is the work around in this case
  2. 157 tl;dr there is a redshift server-side issue with select version() which leads to this query failing. clients directly or indirectly issuing this statement experience more consistent failures. SQLAlchemy is one such client.

bradleybonitatibus commented 11 months ago

Got it, will go down the suggested path of adding retry logic to handle this scenario. Thanks for your guidance!

dbeatty10 commented 11 months ago

This error message looks to me like the database connection was lost.

Brooke-white commented 6 months ago

Improvements to error messages for this case were made in 2.0.914