dbt-labs / dbt-snowflake

dbt-snowflake contains all of the code enabling dbt to work with Snowflake
https://getdbt.com
Apache License 2.0
289 stars 175 forks source link

[CT-852] [CT-836] [Bug] externalbrowser authenticator opens a browser tab for each dbt thread #198

Closed Tolsto closed 1 year ago

Tolsto commented 2 years ago

Is there an existing issue for this?

Current Behavior

Running dbt with Snowflake and authenticator: externalbrowser will open a browser login tab for each dbt thread.

Expected Behavior

Only a single browser window/tab should open regardless how many threads are configured.

Steps To Reproduce

Execute dbt run with the following config (assuming you have at least 4 models to utilize all threads)

org:
  outputs:
    dev:
      [...]
      authenticator: externalbrowser
      threads: 4
      type: snowflake

Relevant log output

No response

Environment

- OS: MacOS 12.4 (arm64)
- Python: 3.9.13
- dbt: 1.1.0 (snowflake plugin 1.1.0)

What database are you using dbt with?

snowflake

Additional Context

No response

jtcohen6 commented 2 years ago

Thanks for opening @Tolsto! I'm going to transfer this to the dbt-snowflake repository, since that's where the fix would be needed.

jtcohen6 commented 2 years ago

@Tolsto Sorry you're running into this โ€”ย it's definitely not the intended behavior, nor the behavior I've been seeing while running dbt-snowflake on my own machine, also macOS 12.4 with M1 (arm64). It's also not logic that really lives in dbt-snowflake โ€”ย it's being passed through to the authentication mechanism defined in snowflake-connector-python.

dbt does open separate connections for each thread, and doesn't reuse connections between threads. For some authentication methods, each connection requires its own auth handshake (https://github.com/dbt-labs/dbt-snowflake/issues/201).

If you're using externalbrowser-based authentication, though, you should be able to cache the authentication token between runs (and between threads in the same run) by using keyring / macOS Keychain Access.

Tolsto commented 2 years ago

The caching works but only after the first invocation of dbt (and authentication). As soon as the token expires, dbt shows the mentioned behaviour. I should add that several of my co-workers experience the same issues (on non-M1 Macs).

jtcohen6 commented 2 years ago

@Tolsto Ah, okay, got it! Glad to hear the caching works generally, and that this is just an issue when re-authenticating.

Are you materializing your models in multiple logical databases? That is, do you have multiple database values configured for models in your project?

I've been able to reproduce this behavior by configuring multiple. At the start of each invocation, dbt runs caching queries in its main thread: show terse schemas in database {database}. If you have multiple databases, it runs a show query for each one, and it runs them in parallel. This is what I see in my debug-level logs (logs/dbt.log) when two browser windows open:

[0m08:38:19.105985 [debug] [ThreadPool]: On list_dbt_test: /* {"app": "dbt", "dbt_version": "1.3.0b1", "profile_name": "sandbox-snowflake", "target_name": "dev", "connection_name": "list_dbt_test"} */
show terse schemas in database dbt_test
    limit 10000
08:38:19.106283 [debug] [ThreadPool]: Acquiring new snowflake connection "list_ad_hoc"
08:38:19.106415 [debug] [ThreadPool]: Opening a new connection, currently in state init
08:38:19.107906 [debug] [ThreadPool]: Using snowflake connection "list_ad_hoc"
08:38:19.108326 [debug] [ThreadPool]: On list_ad_hoc: /* {"app": "dbt", "dbt_version": "1.3.0b1", "profile_name": "sandbox-snowflake", "target_name": "dev", "connection_name": "list_ad_hoc"} */
show terse schemas in database ad_hoc
    limit 10000
08:38:19.108419 [debug] [ThreadPool]: Opening a new connection, currently in state init
Tolsto commented 2 years ago

Yes, we have multiple databases.

yingqiaozheng commented 2 years ago

Is this issue fixed in any version of dbt-snowflake yet? Or is there any workaround by updating the local configuration? or anything else?

JudyCamps commented 1 year ago

I am also looking for a work-around to this. I can't run dbt core due to this authentication issue. I see a similar ticket here https://github.com/dbt-labs/dbt-snowflake/issues/208 CT-880

yingqiaozheng commented 1 year ago

I had this issue when I install dbt using brew, then I uninstall the dbt and re-install it using pip, the issue was gone. I thing the installation packages are different between them . Give it a try and good luck .

JudyCamps commented 1 year ago

Thanks @yingqiaozheng . I removed the brew version and re-installed with pip. Unfortunately, I have the same behavior - infinite loop of tabs opening affirming the snowflake connection, but dbt doesn't seem to accept it.

almekpoh commented 1 year ago

I have the same behaviour, and it's a bit annoying! Doing a fix on this ?๐Ÿ™๐Ÿพ๐Ÿ™๐Ÿพ or a workaround?

system1-dersek commented 1 year ago

We're also seeing a ton of browser tabs opened. Looking at the dbt.log, it appears that it's opening a separate connection per database, per schema. It seems like it's not just the model materialization databases its considering, but also the databases of sources as well - that or our project is referencing another project.

Maybe it's reasonable to require that each 'database' is a different connection in non-snowflake databases, but since snowflake databases are more like a schema, and schema like a sub-schema, it feels like there's an assumption that isn't holding up well. (Maybe snowflake's additional scoping has lead to us organizing data in a way that other databases would consider an antipattern, which makes it tricky for a multi-database tool like dbt)

Could dbt treat a snowflake account as a single database for connection purposes and switch between databases/schemas/roles in sql when something needs to run in a specific context? I think that would still require a connection per-thread, but it's a lot better than per-database, per-schema by about 2 orders of magnitude in our case.

Another alternative might be for snowflake to provide a way to get a time-limited token back from a general connection that could be used to create new connections without needing to use the original authentication method.

system1-dersek commented 1 year ago

Looks like the limited time token already exists: https://docs.snowflake.com/en/sql-reference/parameters.html#label-allow-id-token

https://docs.getdbt.com/reference/warehouse-setups/snowflake-setup#sso-authentication Once I had pip installed keyring, DBT was able to make use of this to only create a single tab for SSO once ALLOW_ID_TOKEN was enabled on our account.

Sounds like Tolsto already is using the keyring approach, so the issues with expiration may still apply

Fleid commented 1 year ago

I'm a bit puzzled about what we can do here. It looks like dbt-snowflake is behaving as expected?

@Tolsto do you still experience the issue? Have you tried this workaround? Is your Snowflake account set up for connection caching?

Tolsto commented 1 year ago

I'm a bit puzzled about what we can do here. It looks like dbt-snowflake is behaving as expected?

@Tolsto do you still experience the issue? Have you tried this workaround? Is your Snowflake account set up for connection caching?

Still have the problem, connection caching is activated. As far as I can tell, the workaround is for people whose keyring integration doesn't work. That's not the case for me, it just tries to authenticate 4 times when I run dbt. Each successful authentication then saves the token in the keychain. A possible solution might be to do a single Snowflake query to trigger the authentication and only then proceed with executing any further queries as the token is then cached in the keyring and no further authentications would be required.

Fleid commented 1 year ago

That makes sense, thank you.

I'll mark that as an enhancement then. The code is working as expected, but it needs to be changed to optimize the behavior. How painful would you say this is to you in using dbt? The scale starting at paper cut, all the way to making you question the state of software engineering in 2023? ;)

Tolsto commented 1 year ago

It's mildly annoying. However, I have to explain to all new dbt users why this is happening as everybody thinks that something is wrong with their setup.

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 1 year ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

samkessaram commented 7 months ago

Commenting to signify continued interest; this would greatly improve the development experience!

markproctor1 commented 7 months ago

This is an ongoing annoyance! We use around 20 database/schema combinations and each opens a tab.

Not show stopping but definitely highest annoyance.

Xemptuous commented 7 months ago

I am also having this issue with externalbrowser auth with Snowflake.

We use 4 threads, and on first connect we get 4 browser tab popups for auth - of which only one or so can actually work - but after that it's good to go. Problem is we have to pkill dbt and rerun for it to work. This auth lasts for a while until the token expires.

My team's solution has become to limit threads in dev to 1, but this limits performance. Would love to see a PR to address this!

busbina commented 4 months ago

I set reuse_connections to True and now it opens 3 tabs, instead of 83, two of which look like they are for the history. Still a bug, but at least its manageable.

ShahNewazKhan commented 1 week ago

+1