dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.61k stars 1.59k forks source link

[CT-2455] [Feature] Less strict secret env vars: allow elsewhere besides just `profiles.yml` + `packages.yml` #7425

Closed jeremyyeo closed 10 months ago

jeremyyeo commented 1 year ago

Is this your first time submitting a feature request?

Describe the feature

I think this one just calls for a reversal of the decision made in https://github.com/dbt-labs/dbt-core/issues/4310

One example where you might want to use secret env vars is when using Snowflake's encrypt_x / decrypt_x functions... e.g.

-- models/pii.sql
select encrypt('john', '{{ env_var("DBT_ENV_SECRET_PWD") }}') as name

Which due to the issue/pr above results in the expected error:

Parsing Error
  Secret env vars are allowed only in profiles.yml or packages.yml. Found 'DBT_ENV_SECRET_PWD' referenced elsewhere.

Describe alternatives you've considered

I don't think any good alternatives exist.

  1. You use a normal var - which exposes the passphrase in plaintext.
  2. On snowflake - you could store the passphrase in a table - setup datamasking/role based access and what not - kinda convoluted.

Who will this benefit?

Anyone wanting to use secret vars in places that are not just those 2 files.

Are you interested in contributing this feature?

Sure

Anything else?

No response

jtcohen6 commented 1 year ago

@jeremyyeo There's a workaround for this today:

-- models/pii.sql
select encrypt('john', '{{ env_var("SECRET_PWD") }}') as name

The value will be scrubbed from dbt's logs:

$ SECRET_PWD=abc123 DBT_ENV_SECRET_PWD=abc123 dbt compile -s pii
12:20:52  Running with dbt=1.5.0-rc2
...
12:20:55  Compiled node 'pii' is:
-- models/pii.sql
select encrypt('john', '*****') as name
create or replace   view analytics.dbt_jcohen.pii

   as (
    -- models/pii.sql
select encrypt('john', '*****') as name
  );

And for a completely separate reason, they will also be scrubbed from Snowflake's query logs, because of the nature of the function (encrypt) being used:

Screenshot 2023-04-21 at 14 07 56 Screenshot 2023-04-21 at 14 16 06

The downside of this approach is, by also storing the value in a non-secret env var, there's nothing preventing someone with development access to the same project from writing that value to a database table (outside a function like encrypt that Snowflake knows to scrub), and then querying the table to read it back out. That's not possible with a "secret" env var due to the intentional limitations around where it can be used/rendered.

Longer term, we should really reconsider how we treat secrets (#6353). Currently, it's just a special behavior for scrubbing a value from the dbt (logs) if it matches an environment variable with a certain prefix.

I'm going to close this specific ask as a wontfix in the meantime.

jeremyyeo commented 1 year ago

I guess this may be one that could be more dbt Cloud related? But is directly the result of dbt-core not supporting secrets in other places (one of those "cross-functional" thingies =]). Why would one use secret env vars in the first place? a) You don't want it logged in logs. b) You don't want all dbt developers to know the secret value in the first place.

I think what you mentioned addresses (a) but not (b)?

jeremyyeo commented 1 year ago

I wrote this elsewhere but worth putting down here too - recapping why simply putting env vars in model files doesn't solve the problem at hand.

Let's assume we want to use the encrypt function (for simplicity).

And assuming a toy example:

-- some_model.sql
select encrypt('john', 'correct horse battery staple') as name

Okay, so when we build this table in Snowflake - what DDL/DML does dbt send? First things first... dbt has to compile the above model into working and proper SQL:

-- target/compiled/some_model.sql
create table some_model as 
select encrypt('john', 'correct horse battery staple') as name;

Now... let's assume for a second that dbt can have secret env vars in model files:

-- some_model.sql
select encrypt('john', '{{ env_var("DBT_ENV_SECRET_PASSWORD") }}') as name

Now... what happens?

Firstly, dbt will always have to compile the sql model file and output it to the target folder - just like above. So - if dbt were to MASK the values in the compiled SQL:

-- target/compiled/some_model.sql
create table some_model as 
select encrypt('john', '!!!!*****!!!!') as name;

^ Then you table some_model will literally be using the string '!!!!*****!!!!' as the encryption passphrase - as opposed to using the env var set by DBT_ENV_SECRET_PASSWORD (which would have been equal to the string 'correct horse battery staple').

Because dbt always always has to compile to valid SQL first - that's kind of the rationale that secret env vars were explicitly blocked from model files.


Separately I went about testing out using Snowflake Data Masking for this.

image

I created a secret table - whose value is only visible to TRANSFORMER and notDEVELOPER - I then tried to use the DEVELOPER role to try and encrypt it's value.

image (1)

It failed with the passphrase 'correct horse battery staple'.

image (2)

But succeeded with the passphrase that was 5 asterisk - the masked value. Meaning it had been encrypted not with 'correct horse battery staple' but '*****' - so unfortunately this looks like a dead end as well unfortunately :(

github-actions[bot] commented 10 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 10 months ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.