Closed ckdake closed 1 month ago
This is very similar to https://github.com/dbt-labs/dbt-core/issues/5489
Thanks for opening this @ckdake !
Indeed, dbt-labs/dbt-core#5489 does look similar.
Did you get a chance to read the following?
I haven't done a deep-dive here to understand all the key details, but I wonder if https://github.com/dbt-labs/dbt-snowflake/pull/428 addresses the issue for Snowflake enough for us to undo the relevant portions of https://github.com/dbt-labs/dbt-core/issues/2645.
I'm doing more of a deep dive here than planned, and learning a lot :)
It looks like https://github.com/dbt-labs/dbt-core/issues/2645 was done to fix a particular thing for a particular snowflake use case. This was done in core and applies to all databases.
Later, https://github.com/dbt-labs/dbt-snowflake/pull/428/files was done to allow not doing this behavior, specifically for snowflake, and done in a way that only applies to snowflake.
There is a lot of good discussion in https://github.com/dbt-labs/dbt-core/issues/5489 and the solution there for a default out-of-the-box configuration is probably the right one, but a "quick fix" here for me for postgres could consist of either:
from the discussion in https://github.com/dbt-labs/dbt-core/issues/5489 it looks like some other adapters may want to default to "close ever time", so 1^ may be a quick easy win that lets dbt users configure things in a way that works for them, without changing the current default behavior?
If you'd consider a patch for "move the 'should i close?' logic from the snowflake adapter to the top level", I'm happy to try out creating the patch.
Thanks for outlining those two options forward @ckdake 🤩 !
And thank you for offering to work an implementation! I'd like to get a consultation with either @jtcohen6 or @Fleid before making a decision on which direction to go.
@jtcohen6 or @Fleid would love to get your preferences between these two paths forward:
reuse_connections
connection profiles configuration to be available to all adaptersIt seems that option 1 would be the least risky (since it would default to False
) while empowering users to make a choice that works for their use case.
But it would be nice if option 2 allowed all users to automatically receive a sensible default rather than needing to opt-in to the behavior that would probably be best for them.
https://github.com/dbt-labs/dbt-snowflake/pull/428 added a reuse_connections
option to connection profiles for dbt-snowflake only.
It is a boolean flag indicating whether to reuse idle connections to help reduce total connections opened. Default is False
.
During node execution (such as model and test), dbt opens connections against a Snowflake warehouse. Setting this configuration to True
reduces execution time by verifying credentials only once for each thread.
The purpose of closing open connections is so that the Snowflake thread can exit.
After https://github.com/dbt-labs/dbt-core/pull/2650, the release
argument to adapter.execute_macro
no longer has any effect. Instead, always close the connection. close()
calls _rollback()
if there is an open transaction.
I think that short term, we're better going with option 1. We preserve the current behavior, add an option to switch, and minimize implementation risks.
I'm leaning that way because we are thinking of a deeper re-evaluation of how we manage connections during a run. I'd rather keep the breaking changes for when that time comes.
Sounds great @Fleid 👍
That would be awesome if you try out a PR for this one @ckdake:
@dbeatty10 PR up, but after some hefty timeboxing here I'm struggling a bit figuring out the right way to test this: I haven't been successful in triggering a 'passing' test for the unchanged code that confirms a connection is closed after "doing something"
Awesome @ckdake !
Figuring out the testing approach will be a good thing to cover during the code review (which will be assigned to someone other than me). In the meantime, I added a comment in of at least one thing you'll want to do to be ready for review: https://github.com/dbt-labs/dbt-core/pull/7587#issuecomment-1542426288
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
My PR to fix this was closed to to the company I was working at shutting down, but the fix in there should still work, and it still needs eyes from someone at DBT on the right testing approach. I'd love to see this merged, and my current dbt
stack is probably going to need this in 6-12 months.
Is there any news on the use new connection that was in the PR that is on this thread ? we are using DBT 1.6 and see Re-using an available connection from the pool (formerly model.A, now model.B)
in the logs after every model and is causing some marts to fail ( we arent sure why yet, but if we run just the marts by themselves they pass ) so having the ability to force a new connection each time would be really useful
Is there any news on the use new connection that was in the PR that is on this thread ?
https://github.com/dbt-labs/dbt-core/pull/7587 was closed before we were able to review it. If someone revived the contents of that PR, we'd still need it to contain working test cases.
We're not sure when we'd be able to prioritize it working on this (either reviewing a PR or working on one ourselves). But we still agree that it feels important.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Is this a regression in a recent version of dbt-core?
Current Behavior
During a
dbt run
, dbt repeatedly closes and opens connections to the database in use. We are using IAM authentication for postgres, so this means credentials are only valid for 15 minutes from their creation, so our dbt runs are failing ~15 minutes in.This behavior was changed resolving https://github.com/dbt-labs/dbt-core/issues/2645 . That issue impacts Snowflake, but the fix was made in a global way that impacts all databases.
A great fix here would be to apply this "force close" only to Snowflake.
Expected/Previous Behavior
When dbt starts up, each thread should open a connection to postgres using the credentials provided. The thread should maintain that connection to postgres until it catches an exception, or the thread terminates.
Steps To Reproduce
To really break things, use a RDS database with IAM credentials, and watch that reconnect cause a failure due to expired credentials after ~15 minutes
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
postgres
Additional Context
I attempted to talk through this in slack here: https://getdbt.slack.com/archives/CMZ2Q9MA9/p1683290792489689