Closed Tolsto closed 1 year ago
Thanks for opening @Tolsto! I'm going to transfer this to the dbt-snowflake
repository, since that's where the fix would be needed.
@Tolsto Sorry you're running into this โย it's definitely not the intended behavior, nor the behavior I've been seeing while running dbt-snowflake
on my own machine, also macOS 12.4 with M1 (arm64). It's also not logic that really lives in dbt-snowflake
โย it's being passed through to the authentication mechanism defined in snowflake-connector-python
.
dbt does open separate connections for each thread, and doesn't reuse connections between threads. For some authentication methods, each connection requires its own auth handshake (https://github.com/dbt-labs/dbt-snowflake/issues/201).
If you're using externalbrowser
-based authentication, though, you should be able to cache the authentication token between runs (and between threads in the same run) by using keyring
/ macOS Keychain Access.
dbt-snowflake
in a new virtual environment, and re-running?keyring
installed in your Python environment running dbt?
$ pip freeze | grep 'keyring'
keyring==21.8.0
The caching works but only after the first invocation of dbt (and authentication). As soon as the token expires, dbt shows the mentioned behaviour. I should add that several of my co-workers experience the same issues (on non-M1 Macs).
@Tolsto Ah, okay, got it! Glad to hear the caching works generally, and that this is just an issue when re-authenticating.
Are you materializing your models in multiple logical databases? That is, do you have multiple database
values configured for models in your project?
I've been able to reproduce this behavior by configuring multiple. At the start of each invocation, dbt runs caching queries in its main thread: show terse schemas in database {database}
. If you have multiple databases, it runs a show
query for each one, and it runs them in parallel. This is what I see in my debug-level logs (logs/dbt.log
) when two browser windows open:
[0m08:38:19.105985 [debug] [ThreadPool]: On list_dbt_test: /* {"app": "dbt", "dbt_version": "1.3.0b1", "profile_name": "sandbox-snowflake", "target_name": "dev", "connection_name": "list_dbt_test"} */
show terse schemas in database dbt_test
limit 10000
[0m08:38:19.106283 [debug] [ThreadPool]: Acquiring new snowflake connection "list_ad_hoc"
[0m08:38:19.106415 [debug] [ThreadPool]: Opening a new connection, currently in state init
[0m08:38:19.107906 [debug] [ThreadPool]: Using snowflake connection "list_ad_hoc"
[0m08:38:19.108326 [debug] [ThreadPool]: On list_ad_hoc: /* {"app": "dbt", "dbt_version": "1.3.0b1", "profile_name": "sandbox-snowflake", "target_name": "dev", "connection_name": "list_ad_hoc"} */
show terse schemas in database ad_hoc
limit 10000
[0m08:38:19.108419 [debug] [ThreadPool]: Opening a new connection, currently in state init
Yes, we have multiple databases.
Is this issue fixed in any version of dbt-snowflake yet? Or is there any workaround by updating the local configuration? or anything else?
I am also looking for a work-around to this. I can't run dbt core due to this authentication issue. I see a similar ticket here https://github.com/dbt-labs/dbt-snowflake/issues/208 CT-880
I had this issue when I install dbt using brew, then I uninstall the dbt and re-install it using pip, the issue was gone. I thing the installation packages are different between them . Give it a try and good luck .
Thanks @yingqiaozheng . I removed the brew version and re-installed with pip. Unfortunately, I have the same behavior - infinite loop of tabs opening affirming the snowflake connection, but dbt doesn't seem to accept it.
I have the same behaviour, and it's a bit annoying! Doing a fix on this ?๐๐พ๐๐พ or a workaround?
We're also seeing a ton of browser tabs opened. Looking at the dbt.log, it appears that it's opening a separate connection per database, per schema. It seems like it's not just the model materialization databases its considering, but also the databases of sources as well - that or our project is referencing another project.
Maybe it's reasonable to require that each 'database' is a different connection in non-snowflake databases, but since snowflake databases are more like a schema, and schema like a sub-schema, it feels like there's an assumption that isn't holding up well. (Maybe snowflake's additional scoping has lead to us organizing data in a way that other databases would consider an antipattern, which makes it tricky for a multi-database tool like dbt)
Could dbt treat a snowflake account as a single database for connection purposes and switch between databases/schemas/roles in sql when something needs to run in a specific context? I think that would still require a connection per-thread, but it's a lot better than per-database, per-schema by about 2 orders of magnitude in our case.
Another alternative might be for snowflake to provide a way to get a time-limited token back from a general connection that could be used to create new connections without needing to use the original authentication method.
Looks like the limited time token already exists: https://docs.snowflake.com/en/sql-reference/parameters.html#label-allow-id-token
https://docs.getdbt.com/reference/warehouse-setups/snowflake-setup#sso-authentication Once I had pip installed keyring, DBT was able to make use of this to only create a single tab for SSO once ALLOW_ID_TOKEN was enabled on our account.
Sounds like Tolsto already is using the keyring approach, so the issues with expiration may still apply
I'm a bit puzzled about what we can do here.
It looks like dbt-snowflake
is behaving as expected?
@Tolsto do you still experience the issue? Have you tried this workaround? Is your Snowflake account set up for connection caching?
I'm a bit puzzled about what we can do here. It looks like
dbt-snowflake
is behaving as expected?@Tolsto do you still experience the issue? Have you tried this workaround? Is your Snowflake account set up for connection caching?
Still have the problem, connection caching is activated. As far as I can tell, the workaround is for people whose keyring integration doesn't work. That's not the case for me, it just tries to authenticate 4 times when I run dbt. Each successful authentication then saves the token in the keychain. A possible solution might be to do a single Snowflake query to trigger the authentication and only then proceed with executing any further queries as the token is then cached in the keyring and no further authentications would be required.
That makes sense, thank you.
I'll mark that as an enhancement then. The code is working as expected, but it needs to be changed to optimize the behavior. How painful would you say this is to you in using dbt? The scale starting at paper cut, all the way to making you question the state of software engineering in 2023? ;)
It's mildly annoying. However, I have to explain to all new dbt users why this is happening as everybody thinks that something is wrong with their setup.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.
Commenting to signify continued interest; this would greatly improve the development experience!
This is an ongoing annoyance! We use around 20 database/schema combinations and each opens a tab.
Not show stopping but definitely highest annoyance.
I am also having this issue with externalbrowser auth with Snowflake.
We use 4 threads, and on first connect we get 4 browser tab popups for auth - of which only one or so can actually work - but after that it's good to go. Problem is we have to pkill dbt and rerun for it to work. This auth lasts for a while until the token expires.
My team's solution has become to limit threads in dev to 1, but this limits performance. Would love to see a PR to address this!
I set reuse_connections to True and now it opens 3 tabs, instead of 83, two of which look like they are for the history. Still a bug, but at least its manageable.
+1
Is there an existing issue for this?
Current Behavior
Running dbt with Snowflake and
authenticator: externalbrowser
will open a browser login tab for each dbt thread.Expected Behavior
Only a single browser window/tab should open regardless how many threads are configured.
Steps To Reproduce
Execute
dbt run
with the following config (assuming you have at least 4 models to utilize all threads)Relevant log output
No response
Environment
What database are you using dbt with?
snowflake
Additional Context
No response