dbt-labs / dbt-redshift

dbt-redshift contains all of the code enabling dbt to work with Amazon Redshift
https://getdbt.com
Apache License 2.0
95 stars 58 forks source link

[ADAP-671] Statements executed one by one instead of as a whole block #521

Open naumovdalex opened 1 year ago

naumovdalex commented 1 year ago

Is this a regression in a recent version of dbt-redshift?

Current Behavior

We have a macro in dbt that runs a set of DDL statements on Redshift. We concatenate those statements in a string (cmd) and run with "{% do run_query(cmd) %}". That string looks like below;

begin;
  ddl_statement_1;
  ddl_statement_2;
  ddl_statement_3;
  ...
  ...
  ddl_statement_x;
end;

After moving to dbt 1.5, we see that those statements are executed one by one, instead of sending the statement as a whole to database. We can see it both in database logs and dbt logs.

Impact

In 1.5 and onwards, these statements take more than 5 minutes to execute. Previously in versions 1.4 and below, they would execute in only a few seconds.

Expected/Previous Behavior

The below statements are executed at once in a single transaction.

begin;
  ddl_statement_1;
  ddl_statement_2;
  ddl_statement_3;
  ...
  ...
  ddl_statement_x;
end;

Steps To Reproduce

  1. Install dbt-core 1.5
  2. Install dot-redshift 1.5.6
  3. Run a set of ddl statements between begin and end in a macro with "{% do run_query(cmd) %}"

Relevant log output

[0m14:34:35.967673 [debug] [MainThread]: On macro_xyz: drop view if exists db_name.schema_name.object_1;
[0m14:34:36.158399 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.158975 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.159297 [debug] [MainThread]: On macro_xyz: create view db_name.schema_name.object_1 as select * from db_name.another_schema_name.object_1 with no schema binding;
[0m14:34:36.528864 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.531540 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.532585 [debug] [MainThread]: On macro_xyz: grant select on table db_name.schema_name.object_1 to group db_group;
[0m14:34:36.729741 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.730303 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.730638 [debug] [MainThread]: On macro_xyz: drop view if exists db_name.schema_name.object_2;
[0m14:34:36.923462 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:36.926900 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:36.927759 [debug] [MainThread]: On macro_xyz: create view db_name.schema_name.object_2 as select * from db_name.another_schema_name.object_2 with no schema binding;
[0m14:34:37.306524 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds
[0m14:34:37.307047 [debug] [MainThread]: Using redshift connection "macro_xyz"
[0m14:34:37.307360 [debug] [MainThread]: On macro_xyz: grant select on table db_name.schema_name.object_2 to group db_group;
[0m14:34:37.514141 [debug] [MainThread]: SQL status: SUCCESS in 0.0 seconds

Environment

- OS: MacOS 13.4 and Ubuntu
- Python: 3.9.16
- dbt-core (working version): 1.4.6
- dbt-redshift (working version): 1.4.6
- dbt-core (regression version): 1.5.0, 1.5.1, 1.5.2
- dbt-redshift (regression version): 1.5.1, 1.5.2, 1.5.6

Additional Context

No response

jan-benisek commented 1 year ago

We noticed the same with our post-hook, what helped us to set autocommit: False in profiles.yml (more details here)

dataders commented 1 year ago

@naumovdalex sorry to hear your existing workflows is failing. As part of the migration to using the redshift_connector connector library in 1.5 instead of psycopg2, this adapter's approach to transactions have changed such that every query is executed at once.

Check out #463 for more context, and I'd love to see if you get the the intended behavior by doing as @jan-benisek says and adding autocommit: False to your profiles.yml.

I suspect that you may still encounter issues due to design decisions of the underlying connector library, for example see https://github.com/aws/amazon-redshift-python-driver/issues/162.

dataders commented 1 year ago

@naumovdalex I'm going to close this issue for now. If possible, I think this would make a great reproducible use case to rationalize https://github.com/aws/amazon-redshift-python-driver/issues/162. Perhaps open an issue there detailing the limitation?

naumovdalex commented 1 year ago

We noticed the same with our post-hook, what helped us to set autocommit: False in profiles.yml (more details here)

I tried this, unfortunately that didn't help as well. Also, as part of our team uses dbt Cloud IDE as well, this would be a bit more complicated for us

naumovdalex commented 1 year ago

@naumovdalex sorry to hear your existing workflows is failing. As part of the migration to using the redshift_connector connector library in 1.5 instead of psycopg2, this adapter's approach to transactions have changed such that every query is executed at once.

Check out #463 for more context, and I'd love to see if you get the the intended behavior by doing as @jan-benisek says and adding autocommit: False to your profiles.yml.

I suspect that you may still encounter issues due to design decisions of the underlying connector library, for example see aws/amazon-redshift-python-driver#162.

@dataders it's not that the workflows fail, but some macros are taking more than 5 minutes to execute, instead of just a few seconds

dataders commented 1 year ago

I understand now -- that not an ideal experience. @naumovdalex can you please provide a reproducible example? This would help the engineer at Redshift work on the above linked issue on their connector library

amychen1776 commented 1 week ago

@naumovdalex are you still experiencing this issue in the new versions of dbt-redshift?