dbt-labs / dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
https://getdbt.com
Apache License 2.0
95 stars 58 forks source link

[Regression] DBT 1.7.3 violates Redshift transactions. #766

Closed maksym-dibrov-solidgate closed 4 months ago

maksym-dibrov-solidgate commented 5 months ago

Is this a regression in a recent version of dbt-core?

Current Behavior

After updating to DBT 1.1.0 -> 1.7.3 we started receiving errors either when running dbt run or when reading the tables that are fully-refreshed by DBT at the moment when they are read. This behaviour never appeared on DBT==1.1.0, so the issue seems to be with the newer version of DBT and the the DB itself. The error is:

Redshift error: could not open relation with OID 6134794

Expected/Previous Behavior

Issue never appeared on DBT==1.1.0 and we were able to read tables that are fully refreshed without problems.

Steps To Reproduce

1) ENV Pipfile

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[packages]
dbt-core = "==1.7.3"
dbt-redshift = "==1.7.0"
elementary-data = {extras = ["redshift"], version = "0.13.0"}
tzlocal = "==4.3"
[requires]
python_version = "3.11"

dbt-packages

packages:
  - package: dbt-labs/dbt_utils
    version: 0.8.5
  - package: elementary-data/elementary
    version: 0.13.0

2) CONFIG models: our_dbt_project: +enabled: true +full_refresh: false +meta: alerts_config: alert_fields: ["table", "description", "result_message", "test_parameters", "test_query", "test_results_sample"] 3) RUN pipenv run dbt run --debug --profiles-dir our_profile --project-dir our_project --models +tag:our_tag

Relevant log output

[0m07:01:01  Running with dbt=1.7.3
[0m07:01:01  running dbt with arguments {'printer_width': '120', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'warn_error': 'None', 'version_check': 'True', 'profiles_dir': 'our_project', 'log_path': 'our_project/logs', 'fail_fast': 'False', 'debug': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'invocation_command': 'dbt run --debug --profiles-dir our_profile --project-dir our_project --models +tag:our_tag', 'log_format': 'default', 'introspect': 'True', 'target_path': 'None', 'static_parser': 'True', 'send_anonymous_usage_stats': 'False'}
[0m07:01:02  Registered adapter: redshift=1.7.0
[0m07:01:02  checksum: <idk_maybe_this_is_sensitive_1234>, vars: {}, profile: , target: , version: 1.7.3
[0m07:01:02  Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
[0m07:01:02  Partial parsing enabled, no changes found, skipping parsing
[0m07:01:03  Found 52 models, 2 operations, 96 tests, 41 sources, 0 exposures, 0 metrics, 1285 macros, 0 groups, 0 semantic models
[0m07:01:03
[0m07:01:03  Acquiring new redshift connection 'master'
[0m07:01:03  Acquiring new redshift connection 'list_dev'
[0m07:01:03  Acquiring new redshift connection 'list_dev'
[0m07:01:03  Using redshift connection "list_dev"
[0m07:01:03  On list_dev: /* {"app": "dbt", "dbt_version": "1.7.3", "profile_name": "our_profile", "target_name": "dev", "connection_name": "list_dev"} */

    select distinct nspname from pg_namespace
[0m07:01:03  Opening a new connection, currently in state init
[0m07:01:03  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'prefer'.To connect without ssl, set `sslmode` to 'disable'.
[0m07:01:03  Redshift adapter: Connecting to redshift with username/password based auth...
[0m07:01:03  Using redshift connection "list_dev"
[0m07:01:03  On list_dev: /* {"app": "dbt", "dbt_version": "1.7.3", "profile_name": "our_profile", "target_name": "dev", "connection_name": "list_dev"} */

    select distinct nspname from pg_namespace
[0m07:01:03  Opening a new connection, currently in state init
[0m07:01:03  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'prefer'.To connect without ssl, set `sslmode` to 'disable'.
[0m07:01:03  Redshift adapter: Connecting to redshift with username/password based auth...
[0m07:01:03  SQL status: SUCCESS in 0.0 seconds
[0m07:01:03  On list_dev: Close
[0m07:01:03  SQL status: SUCCESS in 0.0 seconds
[0m07:01:03  On list_dev: Close
[0m07:01:03  Re-using an available connection from the pool (formerly list_dev, now list_dev_admin)
[0m07:01:03  Re-using an available connection from the pool (formerly list_dev, now list_dev_elementary)
[0m07:01:03  Acquiring new redshift connection 'list_dev_data_mart_views'
[0m07:01:03  Using redshift connection "list_dev_elementary"
[0m07:01:03  Using redshift connection "list_dev_admin"
[0m07:01:03  Using redshift connection "list_dev_data_mart_views"
[0m07:01:03  On list_dev_elementary: BEGIN
[0m07:01:03  On list_dev_admin: BEGIN
[0m07:01:03  On list_dev_data_mart_views: BEGIN
[0m07:01:03  Opening a new connection, currently in state closed
[0m07:01:03  Opening a new connection, currently in state init
[0m07:01:03  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'prefer'.To connect without ssl, set `sslmode` to 'disable'.
[0m07:01:03  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'prefer'.To connect without ssl, set `sslmode` to 'disable'.
[0m07:01:03  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'prefer'.To connect without ssl, set `sslmode` to 'disable'.
[0m07:01:03  Redshift adapter: Connecting to redshift with username/password based auth...
[0m07:01:03  Redshift adapter: Connecting to redshift with username/password based auth...
[0m07:01:03  SQL status: SUCCESS in 0.0 seconds
[0m07:01:03  Using redshift connection "list_dev_admin"
[0m07:01:03  On list_dev_admin: /* {"app": "dbt", "dbt_version": "1.7.3", "profile_name": "our_profile", "target_name": "dev", "connection_name": "list_dev_admin"} */
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike 'admin'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike 'admin'
[0m07:01:03  SQL status: SUCCESS in 0.0 seconds
[0m07:01:03  Using redshift connection "list_dev_elementary"
[0m07:01:03  SQL status: SUCCESS in 0.0 seconds
[0m07:01:03  On list_dev_elementary: /* {"app": "dbt", "dbt_version": "1.7.3", "profile_name": "our_profile", "target_name": "dev", "connection_name": "list_dev_elementary"} */
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike 'elementary'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike 'elementary'
[0m07:01:03  Using redshift connection "list_dev_data_mart_views"
[0m07:01:03  On list_dev_data_mart_views: /* {"app": "dbt", "dbt_version": "1.7.3", "profile_name": "our_profile", "target_name": "dev", "connection_name": "list_dev_data_mart_views"} */
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike 'data_mart_views'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike 'data_mart_views'
[0m07:01:03  SQL status: SUCCESS in 0.0 seconds
[0m07:01:03  On list_dev_admin: ROLLBACK
[0m07:01:03  On list_dev_admin: Close
[0m07:01:03  Re-using an available connection from the pool (formerly list_dev_admin, now list_dev_data_mart)
[0m07:01:03  SQL status: SUCCESS in 0.0 seconds
[0m07:01:03  On list_dev_elementary: ROLLBACK
[0m07:01:03  Using redshift connection "list_dev_data_mart"
[0m07:01:03  On list_dev_data_mart: BEGIN
[0m07:01:03  On list_dev_elementary: Close
[0m07:01:03  Opening a new connection, currently in state closed
[0m07:01:03  Re-using an available connection from the pool (formerly list_dev_elementary, now list_dev_dbt_test__audit)
[0m07:01:03  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'prefer'.To connect without ssl, set `sslmode` to 'disable'.
[0m07:01:03  Redshift adapter: Connecting to redshift with username/password based auth...
[0m07:01:03  Using redshift connection "list_dev_dbt_test__audit"
[0m07:01:03  On list_dev_dbt_test__audit: BEGIN
[0m07:01:03  Opening a new connection, currently in state closed
[0m07:01:03  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'prefer'.To connect without ssl, set `sslmode` to 'disable'.
[0m07:01:03  Redshift adapter: Connecting to redshift with username/password based auth...
[0m07:01:04  SQL status: SUCCESS in 0.0 seconds
[0m07:01:04  SQL status: SUCCESS in 0.0 seconds
[0m07:01:04  Using redshift connection "list_dev_data_mart"
[0m07:01:04  Using redshift connection "list_dev_dbt_test__audit"
[0m07:01:04  On list_dev_data_mart: /* {"app": "dbt", "dbt_version": "1.7.3", "profile_name": "our_profile", "target_name": "dev", "connection_name": "list_dev_data_mart"} */
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike 'data_mart'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike 'data_mart'
[0m07:01:04  On list_dev_dbt_test__audit: /* {"app": "dbt", "dbt_version": "1.7.3", "profile_name": "our_profile", "target_name": "dev", "connection_name": "list_dev_dbt_test__audit"} */
select
        table_catalog as database,
        table_name as name,
        table_schema as schema,
        'table' as type
    from information_schema.tables
    where table_schema ilike 'dbt_test__audit'
    and table_type = 'BASE TABLE'
    union all
    select
      table_catalog as database,
      table_name as name,
      table_schema as schema,
      case
        when view_definition ilike '%create materialized view%'
          then 'materialized_view'
        else 'view'
      end as type
    from information_schema.views
    where table_schema ilike 'dbt_test__audit'
[0m07:01:04  SQL status: SUCCESS in 0.0 seconds
[0m07:01:04  On list_dev_data_mart: ROLLBACK
[0m07:01:04  SQL status: SUCCESS in 0.0 seconds
[0m07:01:04  On list_dev_dbt_test__audit: ROLLBACK
[0m07:01:04  On list_dev_data_mart: Close
[0m07:01:04  On list_dev_dbt_test__audit: Close
[0m07:01:04  Redshift adapter: Redshift error: could not open relation with OID 6134794
[0m07:01:04  On list_dev_data_mart_views: ROLLBACK
[0m07:01:04  Redshift adapter: Error running SQL: macro list_relations_without_caching
[0m07:01:04  Redshift adapter: Rolling back transaction.
[0m07:01:04  On list_dev_data_mart_views: Close
[0m07:01:04  Connection 'master' was properly closed.
[0m07:01:04  Connection 'list_dev_dbt_test__audit' was properly closed.
[0m07:01:04  Connection 'list_dev_data_mart' was properly closed.
[0m07:01:04  Connection 'list_dev_data_mart_views' was properly closed.
[0m07:01:04
[0m07:01:04  Finished running  in 0 hours 0 minutes and 1.13 seconds (1.13s).
[0m07:01:04  Encountered an error:
Database Error
  could not open relation with OID 6134794
[0m07:01:04  Resource report: {"command_name": "run", "command_wall_clock_time": 2.5737379, "process_user_time": 6.363262, "process_kernel_time": 0.56465, "process_mem_max_rss": "232380", "process_out_blocks": "4936", "command_success": false, "process_in_blocks": "0"}
[0m07:01:04  Command `dbt run` failed at 07:01:04.285960 after 2.58 seconds
[0m07:01:04  Flushing usage events

Environment

- OS: Ubuntu 20.04
- Python: 3.11
- dbt (working version): 1.1.0
- dbt (regression version): 1.7.3

Which database adapter are you using with dbt?

redshift

Additional Context

Started a thread on this issue in slack's dbt-community: https://getdbt.slack.com/archives/CJARVS0RY/p1712935236552749

dataders commented 5 months ago

hi @maksym-dibrov-solidgate thank you writing this up. I agree that there's something seriously wonky about how dbt-redshift handles transactions after version 1.5.

The predominant reason we cannot support user's defined transactions is due to an limitation of the redshift driver. see https://github.com/aws/amazon-redshift-python-driver/issues/162. Ultimately, I feel that until that limitation is addressed, we cannot confidently address transactions in dbt without significant additional investment.

In the meantime, I'd be very interested to know how your project is configured that causes it to fail.

Can you share more about how your project is configured such that it fails after one second?

More importantly, can you help me create a minimal reproducible example of the behavior. As in, starting with vanilla jaffle_shop, what's the least I'd have to modify to reproduce the behavior you are seeing?

Last I'm curious if the behavior is different if you specify autocommit: False in your profiles.yml (more info on autocommit)

maksym-dibrov-solidgate commented 5 months ago

@dataders thank you for replying to this issue! I will try to reproduce it on vanilla jaffle_shop and come back to this thread when I succeed

martynydbt commented 4 months ago

We see in the slack thread linked here that this is fixed in 1.7.7, closing. :D

maksym-dibrov-solidgate commented 4 months ago

@martynydbt the thread says it is fixed in 1.7.5. I updated to 1.7.5 but it still throws the same error, so these two issues should be unrelated

maksym-dibrov-solidgate commented 4 months ago

Tried with dbt-redshift==1.7.7 - still the same error

colin-rogers-dbt commented 4 months ago

@maksym-dibrov-solidgate (on slack): tried with 1.4.9 - still the same issue. Table materialization style is "table" and I get OID does not exist errors when trying to read it during a dbt run

maksym-dibrov-solidgate commented 4 months ago

I figured out what the issue was and fixed it. Here is what happened:

We had two separate processes that both threw the same error OID does not exist... . These processes were:

  1. When executing dbt run the command will fail before starting to run the models;
  2. When reading the table that was configured as materialized = table the service that was reading the table (uses GORM lib to query Redshift) would occasionally receive OID does not exist error exactly when the model was being materialized. As it turned out, both problems are caused by the same problem: some object (view/table) is being dropped outside of a single transaction and the other process expects this object to be there

In the second case it looks that dbt renames and drops tables in two separate transactions when called with full-refresh flag or if the model's materialization implies that it is dropped at each run (e.g. materialized in (view, table)). In the first case we started two dbt runs in two separate bash sessions. The run_A was materializing table_A in schema_for_tables and run_B was materializing table_A in schema_for_tables. Both these runs also executed an upstream dependency that was dropping view_A in schema_for_views and view_B in schema_for_views respectively. At some moment it appeared that run_A was dropping view_A to rebuild it from scratch, but the list_relations_without_caching macro that was executed by run_B was expecting view_A would exist in schema_for_views. Hence, it threw an error.

To me this problem looks like an issue with transactionality and considering that the transactionality of operations is determined by dbt, it looks that it can be fixed without intervention of AWS Redshift devs.

colin-rogers-dbt commented 4 months ago

Thanks for clarifying @maksym-dibrov-solidgate! Discussed this with the team, this behavior is due to dbt not supporting parallel runs. Specifically we're limited by:

  1. Each run having it's own cache
  2. Updates that require downstream dependencies are not wrapped in a single transaction (i.e. cascade drop/recreate table_A and view_A). Currently dbt-redshift doesn't "know" about view_A when it's executing table_A so it's not possible for us to do so at this time.

Unfortunately this just isn't functionality that we can support and have no plans to do so for now.

jtcohen6 commented 4 months ago

Thanks @colin-rogers-dbt! One additional detail: We strongly encourage folks to use late-binding views on Redshift, which should avoid this issue of concurrent cascade-dropping (table_A -> view_A): https://docs.getdbt.com/reference/resource-configs/redshift-configs#late-binding-views

victor-frank-signet commented 1 month ago

This is a problem in 1.8.4, not sure if this is closed because a spike is opened or misinformation, but this is still a problem. Same query as the top listed at the stort of this issue. Same same error could not open relation with OID 42762712.

Here is one of my errors from today:

/ {"app": "dbt", "dbt_version": "1.8.4", "profile_name": "dev", "target_name": "env", "connection_name": "list_db_schema"} /\nselect\n table_catalog as database,\n table_name as name,\n table_schema as schema,\n 'table' as type\n from information_schema.tables\n where table_schema ilike ''\n and table_type = 'BASE TABLE'\n union all\n select\n table_catalog as database,\n table_name as name,\n table_schema as schema,\n case\n when view_definition ilike '%create materialized view%'\n then 'materialized_view'\n else 'view'\n end as type\n from information_schema.views\n where table_schema ilike ''

It is ALWAYS the same type of query. SELECT .. FROM information_schema.tables...

We are seeing this a lot, we run in airflow with a bash operator starting dbt run ... --select model.*. ~100 airflow dags, multiplied lets say 5 dag tasks per dag. Each tasks runs 1-3 models. Some dags run hourly/daily/weekly. So we have lots of concurrent dbt run executions at any one point in time.

I actually think dbt-redshift knew about this error in 1.6 but for some reason did not fix it: https://github.com/dbt-labs/dbt-redshift/pull/569/files#diff-4ab8ac7de10abb4128d1534093662382aaeb4e8a99ed2f3f880a4d9720717d00. So this test is the reproduction of the issue. Maybe there is a way I as a dbt user can just add a bunch of retry logic to the information_schema queries, unti redshift-connector is fixed?

I think this is related to locking/transaction concurrency. This never occurs in our pre-prod environment which is scheduled ad-hoc, less concurrency. In airflow, if I want to reduce this error on dbt-redshift==1.8.4, I need to reduce the concurrent tasks, and reduce the threads in the dbt profile. Once I make those changes the error occurs less frequently intra-dag (but i still have multiple dags running at one time).

I could see why late-binding views is an option since that would reduce the information_schema queries performed by dbt, thus avoiding this concurrency/locking issue.

But I am using this is in production today receiving 100 alerts a day due to this behavior. Is another solution like pinning dbt-redshift=1.4.0 version prior to adding redshift-connector? Since this conversation seems to suggest that to fix dbt-redshift first we wait for redshift-connector to fix some things.

victor-frank-signet commented 1 month ago

Update I was just notified that there is a forthcoming redshift patch P184 to be applied to Q3 2024 to fix could not open OID

amychen1776 commented 4 weeks ago

Curious for those who are still seeing this issue, have you opened up an issue with AWS? This is the best way for us (dbt Labs) to advocate to the Redshift team that we need fixes for the Redshift driver.

maksym-dibrov-solidgate commented 3 weeks ago

@amychen1776 I haven't opened an issue with AWS. If you open one - I will be happy to give it thumbs up

amychen1776 commented 3 weeks ago

@maksym-dibrov-solidgate I recommend doing it through AWS support and opening up your own ticket (so you wouldn't be able to thumbs up). In working with the Redshift team, that is how they have recommended flagging up issues like this.