astronomer / astro-sdk

Astro SDK allows rapid and clean development of {Extract, Load, Transform} workflows using Python and SQL, powered by Apache Airflow.
https://astro-sdk-python.rtfd.io/
Apache License 2.0
350 stars 43 forks source link

Fix Redshift tests #1010

Open utkarsharma2 opened 2 years ago

utkarsharma2 commented 2 years ago

Describe the bug In CI there are flaky tests, we need to fix - https://github.com/astronomer/astro-sdk/actions/runs/3197623724/jobs/5221086373

The following test is flaky i.e. failing every now and then:

FAILED tests/sql/operators/test_load_file.py::test_aql_replace_existing_table[redshift] - sqlalchemy.exc.ProgrammingError: (redshift_connector.error.ProgrammingError) ***'S': 'ERROR', 'C': 'XX000', 'M': 'could not find tuple for constraint 463297', 'F': '../src/pg/src/backend/utils/adt/ruleutils.c', 'L': '2750', 'R': 'pg_get_constraintdef_worker'***
[SQL: 
        SELECT
          n.nspname as "schema",
          c.relname as "table_name",
          t.contype,
          t.conname,
          t.conkey,
          a.attnum,
          a.attname,
          pg_catalog.pg_get_constraintdef(t.oid, true) as condef,
          n.oid as "schema_oid",
          c.oid as "rel_oid"
        FROM pg_catalog.pg_class c
        LEFT JOIN pg_catalog.pg_namespace n
          ON n.oid = c.relnamespace
        JOIN pg_catalog.pg_constraint t
          ON t.conrelid = c.oid
        JOIN pg_catalog.pg_attribute a
          ON t.conrelid = a.attrelid AND a.attnum = ANY(t.conkey)
        WHERE n.nspname !~ '^pg_'
        ORDER BY n.nspname, c.relname
        ]
(Background on this error at: https://sqlalche.me/e/14/f405)

see here for example.

Version

To Reproduce Steps to reproduce the behavior: Run CI multiple times you'll see CI jobs failing.

Expected behavior An unrelated CI job should not fail.

pankajastro commented 1 year ago

recently also failed with similar error https://github.com/astronomer/astro-sdk/actions/runs/3523069879/jobs/5906787666

sunank200 commented 1 year ago

@pankajastro it would be good to add the findings you were mentioning in this thread as well

pankajastro commented 1 year ago

This look like an issue related to concurrent run. I checked the original-reported logs which say some requested resources were not found. My hunch is somehow the resource got deleted but for a particular reported test currently, we create tmp table so each run should have its own table this also does not explain the actual issue.

Just a couple of hours back I hit a similar error on newly added tests and here we have a non-tmp table. https://github.com/astronomer/astro-sdk/actions/runs/3523069879/jobs/5906787666

there are some discussions on this similar issue https://github.com/sqlalchemy-redshift/sqlalchemy-redshift/issues/104 https://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html

I'm not able to reproduce the issue on reported tests and added the fixes for newly added tests here