Open 0xRobin opened 1 year ago
We also ran into this one for exactly the same use case. In our CICD we build all parents of changed_model
and one level of child models. This gives us a good indicator if anything has broken downstream without having to build all children which, when you're using snowflake and have 1500+ models can make your CICD pipeline both slow and expensive.
The workaround we'll likely employ is to introduce some code that queries graph.pickle or uses dbt list -s changed_model+1
first in order to get the child models and then using that data, run dbt like this:
i.e.
dbt build -s +changed_model+1 +child_1 +child_2 ... +child_n
So not impossible to workaround but ugly and may have problems when the command line args get very big
Only suggestion I would have is to maybe use syntax @changed_model+1
for consistency.
@0xRobin @whisperstream Thanks for the idea, and for weighing in!
I feel like getting the syntax right will be the trickiest bit by far. Should 1@changed_model
mean:
To @whisperstream's point, you can do this with a series of list
commands, and more ergonomically if you're wrapping dbt-core
's Python entry point (new in v1.5).
Although in CI, there should be no need to build all parents of changed_model
if you're using --defer
or clone
(which will be new in v1.6!).
@jtcohen6 thanks for the additional info.
From my understanding @changed_model
currently works as:
Which could also be expressed as +(changed_model+)
as some kind of nested graph selection. Ideally we could control both the children and the parent depth.
I think it's important here to follow the convention of having parent depth before the model and child depth after the model..
usage proposal:
@changed_model+
(equivalent as @changed_model
)
all children, and all their parents1@changed_model+
(equivalent as1@changed_model
)
all children, and their first-order parents@changed_model+1
first-order children, and all their parents1@changed_model+1
first-order children, and their first-order parentsI think @0xRobin proposed syntax seems straight-forward and intuitive.
One thing I've never actually been sure about is the current implementation of @changed_model
- "the parents of the children" - does "the parents" mean only first-order parents or all (great)(grand)parents? This could be clarified with a small change to the documentation.
I think it's important here to follow the convention of having parent depth before the model and child depth after the model..
Agree!
The order of operations for resolving the @
operator: first we select the children, and second we select their parents. So it is a bit like reading from right to left.
I'm on board with the syntax that @0xRobin is proposing. And to @foundinblank's point, there's an opportunity to freshen up our docs on @
. I'll tag this one a good first issue.
Likely starting point: Add parents_depth
and children_depth
arguments to select_childrens_parents
, and pass them in accordingly.
And add some more tests:
@0xRobin Hey! Just reading this part of the issue:
Currently we can achieve this with the @ operator and a resource-type filter but this also includes any seeds that are used in tests of any of the child models of the specified model, which are not necessary for us and increases run times.
Is this by any chance related to using one of the unit testing packages? I ask because we're working on building unit testing as native functionality in dbt-core. Would love to hear your thoughts if so, here's the discussion!
Is this by any chance related to using one of the unit testing packages?
Hi @graciegoheen, thanks for pointing out that interesting discussion. We're not using any external unit testing package. The tests we are running are a combination of seeds and a custom generic test to perform a seed equality check. Our problem arises from our setup where the seeds aren't materialized in production and are only materialized in our slim CI when they are modified. This results in the seeds not being accessible in the slim CI when you later modify the model but not the seed.
Is this your first time submitting a feature request?
Describe the feature
Currently the
@
graph operator does not allow for depth control in any way and will always include all children of a model and their (the childrens) parents. I would like to be able to specify the depth of the child selection in the same way the children/parent operators allow for depth specification.example:
dbt ls -s 1@my_model
would result in the selection of the models children up to depth 1 + all the parents of those children. The same should be possible with a YAML selector definitionDescribe alternatives you've considered
I tried finding a workaround with complex yaml selector definitions but since we can't "layer" definitions (take all the 1 depth children of this model + then take all the 1 depth parents of all the models in the selection) I could not get it to work as I wanted.
Who will this benefit?
For our specific use case we want to run all seeds that are used in the tests of a specified model.
Currently we can achieve this with the
@
operator and a resource-type filter but this also includes any seeds that are used in tests of any of the child models of the specified model, which are not necessary for us and increases run times.Are you interested in contributing this feature?
sure, I haven't looked at the codebase structure but this feels like a good first issue.
Anything else?
Some extra context on where/why this would help us: https://github.com/duneanalytics/spellbook/pull/3408#issuecomment-1562445109