dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.01k stars 1.63k forks source link

[CT-3088] [Feature] Automatically grant usage on schema #8586

Open aBBDnGus opened 1 year ago

aBBDnGus commented 1 year ago

Is this your first time submitting a feature request?

Describe the feature

In dbt, one can specify which user has access to a model via grants = {'select': ['user']}. But the user can only access a view, if it is also granted usage to the schema where the model is located. Currently, one has to do this via a post-hook, e.g. grant usage on schema {{ model.schema }} to {{ model.config.grants.select|join(', ') }}. Unfortunately, this makes it impossible to parallelize the processes with threads, because the post-hook is applied concurrently on the same schema and the error tuple concurrently updated appears.

Describe alternatives you've considered

Who will this benefit?

More complex model structures, including multiple schemas and users.

Are you interested in contributing this feature?

No response

Anything else?

No response

dataders commented 1 year ago

@aBBDnGus are you using Snowflake? Is so, I feel that this is already covered in https://github.com/dbt-labs/dbt-snowflake/issues/715. I do agree that we should address this!

My suggestion is that we first solve this within the context Snowflake, then upstream the solution into something more general.

One question I have (perhaps you can help me here)

  1. given that grants shipped 9 months ago as part of 1.3.0, why have we not heard more users asking for this?
  2. what should the best practice permission model be for granting access to end users? Why not grant USAGE on all databases where end users are given SELECT access to particular tables?

For 1, my hunch is that this GRANT USAGE only happens within a minory of data teams, for reasons like:

For 2, I'd love to get some input from users as to how to use grants in combination with principle of least privilege.

aBBDnGus commented 1 year ago

I'm using PostgreSQL.

For 1: I think, this is not a major problem to many users. It arises only when using multiple schemas and rights management (i.e. maintain application specific tool accounts which have access to specific models). Even then, one can circumvent this by manually granting the usage when a new schema is created.

For 2: In my opinion, it is the best solution to grant usage on a schema to all users which have access to one of the contained models. The dbt process creates schemas automatically, but it does not grant access.

jtcohen6 commented 1 year ago

I think some of the limitation here is that dbt doesn't have a first-class construct for defining & configuring schemas. Some related conversation in this previous discussion:

The proposal in this issue is actually closer to dbt's current behavior: If a model needs to land in schema_x, and schema_x doesn't yet exist, just go ahead and create the schema. Similarly, if a model in schema_x needs to grant select to a specific role, just go ahead and grant usage to schema_x, too.

That's more or less what Doug & I cooked up for the over-clever approach here: https://docs.getdbt.com/blog/configuring-grants#option-b-too-clever-by-half

dataders commented 1 year ago

adding @chwiese's take from an internal Slack thread

I do not think dbt should automatically set usage (or any other grants) on schemata. The key reason is that dbt can not (out of the box) manage schema RBAC in its entirety so teams should already have a solution for that in place and dbt should not interfere with it. These alternative solutions are likely also the reason why this has not frequently come up yet. IMHO the situation would be different if dbt offered more explicit options for managing schemata like @jtcohen6 mentions in #5781. That being said, for Snowflake we would then have the same situation one level further up with databases.

I agree that implementation depends on whether or not dbt formally manages schemas and database objects, or if they remain as they are today as effectively attributes of models

aBBDnGus commented 10 months ago

We have solved the original problem in the meantime with setting

+post-hook:
  - sql: "select pg_advisory_xact_lock(1); grant usage on schema {{ model.schema }} to {{ model.config.grants.select|join(', ') }}"
    transaction: false

The advisory lock prevents that the command is executed on the same schema for multiple times and the error tuple concurrently updated appears.

ustulation commented 9 months ago

I ran into this recently while using AWS Quicksight which needed to access tables from Redshift. Quicksight user needs to be granted usage on the schema where the tables (or views) reside. These schemas are created by dbt (due to the +schema field under the appropriate section in models in dbt_project.yml) but unfortunately it seems to lack the ability to grant usage to a user on it. So did it with a post-hook on a randomly chosen model in the schema because if that model has run successfully it means that the schema creation has already been done. However it feels a bit of a hack.

kdazzle commented 8 months ago

+1 Setting grants in the schema config like a model seems like it would make the most sense. Given that dbt creates schemas, it becomes harder to say that it isn't dbt's responsibility.

models:
  my_dbt:
    marts:
      vehicles:
        +schema: vehicles
          config:
            grants:
              select: ["analyst"]
eirikmag commented 7 months ago

+1

tejas-esto commented 5 months ago

+1

stolomeo911 commented 4 months ago

+1