Open jtcohen6 opened 9 months ago
I agree with the proposed solution, and for our company, with a dbt mesh implementation, protected with a defined yeslist (I suggest protected_whitelist
) will be of more use than straight up public models.
I also agree that I would not expect to find any models with a protected_yeslist in the list of public models in dbt explore.
Is there value in considering this question from the project level, rather than the model level?
My use case might be slightly different, in that I'm thinking about model access as a way to shape our account-level project lineage graph. I'm happy with full transparency around metadata and discoverability, but I'd like to have a mechanism to guide the relationships created through cross-project ref.
With cycle detection at the project level, I've been trying to optimize our cross-project graph the same way I'd optimize a single project's. Do the same modeling concepts apply? The ones I hold close are those defined in dbt project evaluator, which overlap with those from the inherited project refactoring session at Coalesce 2022.
Let me know if I'm too far afield here, but I think I'd prefer to have a way to define which projects can access all of a given project's public models, rather than determining cross-project access at the model level.
@katieclaiborne Thanks for thinking through this!
In the case of:
project_a --> project_b --> project_c
It sounds like you want a way to say, "The models in project_c
should never be able to reach out and access the public models in project_a
."
I have a few more questions, if you're willing to humor me:
project_a
?project_b
can access) or as a "nolist" (every project except project_c
can access)?Of course!
It's an expectation I'd imagine putting in place within project_a
, as a "yeslist". I wondered about having a references.yml
file, as a companion to dependencies.yml
. The first would define which projects are allowed to reference the root project, just as the second defines which projects the root project references.
Yes, node-level access restrictions would feel more appropriate to me if dbt were to support cycle detection at the node level. I've also wondered whether model groups could serve as a middle ground!
I'm wrestling with how to observe and evaluate our account-level project relationships. The project DAG in Explorer is great, but some of the emerging relationships have my pattern recognition brain going haywire, when really, there may not be cause for concern.
@katieclaiborne Following up from our conversation last week! It feels like we were getting at a distinction between:
It feels to me like the proposal in this issue is in keeping with that distinction:
public
models by anyoneprotected
models for specific downstream useThis also feels in keeping with the (loose) inspiration we're taking from other object-oriented languages, where "protected" means within same package and/or "friend" classes.
I think this is what that might look like in practice:
# common_staging_project/dbt_project.yml
models:
common_staging_project:
staging:
finance_stuff:
+access: protected
# should we call this 'derived_projects', or 'friends' ? :)
+protected_yeslist: ['finance']
+marketing_stuff:
+access: protected
+protected_yeslist: ['finance']
I've also wondered whether model groups could serve as a middle ground!
Is your idea here that, rather than defining this as a new config (protected_yeslist
) — so long as the group
config matches across both projects — then the reference to a protected
model in the other project is allowed? (Thanks to @jenna-jordan's comment here which helped this click for me.)
That's an interesting idea!
protected
models (rather than private
)The upshot of that change would be:
private
models can be referenced by other models in the same namespace (project/package) AND same groupprotected
models can be referenced by other models in the same namespace (project/package) OR same grouppublic
models can be referenced in anywhere (any namespace, any group, etc)Yes, I like it! To be honest, I hadn't thought through the groups implementation that far. Thanks to you and Jenna for articulating an elegant design.
My mind immediately goes to how we might visualize groups as they exist across projects (as in a slightly more granular version of the project graph in dbt Explorer), but that's well beyond the scope of this issue.
The more I think about this:
group
config.group
with the same name, and start using them.)To reconcile these two requirements, I think the producer-side group
needs an additional attribute. Following the existing example, this could look like:
# common_staging_project/models/groups.yml
groups:
- name: finance
owner:
email: zach.jaff@jaffleshop.com
name: Zach Jaff
projects: # default: this project only
- common_staging_project
- jaffle_shop_mesh_finance
What does this mean?
finance
group "extends" across both the common_staging_project
and the jaffle_shop_mesh_finance
jaffle_shop_mesh_finance
) is named in the producer project's group
, then models in the consumer project which also belong to the same group
can reference its protected modelsgroup
+ owner
, because these are already defined in the upstream project. If these projects tend to be maintained by different teams, the upstream project is saying, "These models are mostly relevant to (if not also directly owned by) the downstream team." This describes some of the hub-and-spoke patterns we're seeing in practice.private
models. For references to private
models, both the namespace (project/package) and the group must match.I see the primary risk of this approach as overloading (and confusing) the groups
feature. Right now, groups
are always a subset of project namespaces. (Even for this, there's already an exception: installed packages with restrict-access: False
.) The idea that groups
can extend across projects makes the diagrams more complicated, and the concept of ownership potentially more confusing.
Is this your first time submitting a feature request?
Describe the feature
prompted by Slack conversation with @eivind-stb
There are currently three access modifiers that determine where a model can be
ref
'd:private
: can only be referenced by resources in the samegroup
protected
: can only be referenced by resources in the same project/packagepublic
: can be referenced anywhere (including other projects/packages)I've heard from a number of folks who want something in between
protected
+public
: "public, but limited to this project + [project x, project y]"Rather than a "downgrade" of
public
, I think this is actually an "upgrade" ofprotected
: "only this project, plus specific other projects [if/as declared]"I'm imagining a new model config, which I'll call
protected_yeslist
for now (naming suggestions welcome!):This should work for both types of cross-project references:
project
dependencies (docs - a feature of dbt Cloud Enterprise)package
dependencies withrestrict-access: True
(docs)As a starting point, we'd want to update the logic here that determines whether a
ref
to aprotected
model is valid.Thinking about complementary experiences (dbt Explorer): If a model is
protected
, I would not expect it to be discoverable by anyone (as trulypublic
models are). It's available on a "need-to-know" / "need-to-ref" basis.Describe alternatives you've considered
Modifying the behavior of
public
models rather thanprivate
models. This seems to be most people's first intuition when asking for this capability. But I find the extension ofprotected
much more satisfying! As @eivind put it:This gets us to that desired nomenclature, without any change to existing behavior.
Not doing this. Model
access
in dbt is really about access to metadata (discoverability); it shouldn't be mistaken for access on the underlying data, which is managed viagrants
(or more granular access policies) within the underlying data platform. Though this can be facilitated by dbt, via thegrants
config, it's not dbt's ultimate responsibility.Who will this benefit?
Folks adopting dbt Mesh who want to limit metadata access to specific models/projects
Are you interested in contributing this feature?
always :) just a question of timing!
Anything else?
No response