Open solomonshorser opened 2 years ago
Hey @solomonshorser, the short answer is that --selector
takes precedence, and --exclude
is ignored.
This has come up before: https://github.com/dbt-labs/docs.getdbt.com/issues/803 (originally transferred from this repo). The solution proposed then was to update our docs. It's there now, but tucked away in a note at the very bottom of https://docs.getdbt.com/reference/node-selection/syntax:
Note that when you're using
--selector
, most other flags (namely--select
and--exclude
) will be ignored.
This could definitely be clearer! Do you think the right answer looks like:
--selector
and one of --select
/ --exclude
@jtcohen6 I had only read the YAML Selectors page, so some clarification there would help. A warning message might be nice too, it would have saved me some frustration. Or maybe instead of a warning, an actual hard-stop error message: "Incompatible selection criteria were given..." or something like that.
Back to my original problem: I have a selector (not a simple one) that selects a number of models. I'd like to test everything covered by that selector, but there is one particular model that has some known testing issues. I'd like to be able to test all of the models except that problematic model (until other parties are able to either confirm certain changes to the test or the underlying data). Is there any way to do this besides modifying the YAML selector itself? This exclusion is not meant to be permanent, just something I want to do on an ad-hoc basis.
Makes total sense. https://github.com/dbt-labs/dbt-core/pull/4827 proposes to make possible the inheritance of one selector by another. That alone should provide for a more ergonomic approach to your situation.
A bigger question, one I'm not sure I feel ready to answer: Should passing --selector
alongside --select
/--exclude
have the effect of inheriting the selector, and then applying the selection/exclusion logic? What about passing --select
/--exclude
at all when there's a selector defined with default: true
? Historically, our approach has been:
--select
/--exclude
causes a default selector to be ignored entirely--selector
flag causes --select
/--exclude
to be ignored entirelyShould passing
--selector
alongside--select
/--exclude
have the effect of inheriting the selector, and then applying the selection/exclusion logic?
That would make sense to me, though I don't know if "inherit" is the word?
I thought of it as the set of whatever was selected by --selector
unioned with whatever is selected by --select
then subtract whatever is excluded by --exclude
.
Something like: dbt test --selector staging_models --select extra_models --exclude problematic_models
I thought would have an effect like running on the set: staging_models UNION extra_models MINUS problematic_models
Just encountered a case where this would be very helpful functionality. Generally we'll want a standard selector, but being able to add an additional intersection, union, or exclude would be great so that we can tweak a selector for a particular execution dynamically.
I also have a use case where I'd like to use selectors in both --select and --exclude arguments. Example where I run all the marketing models but exclude any ecommerce models in the marketing DAG:
dbt run --select selector:marketing --exclude selector:ecommerce
I think this would be a very nice behaviour to add indeed.
Another interesting use case would be to build only models from a selector that passed tests on the previous run:
dbt build --selector <my_selector> --exclude 1+result:fail --state ./target
.
I've come around on this. Here's how I think it should work:
--selector
is resolved first. Then, it's "unioned" together with any criteria passed into --select
&/or --exclude
.--select
/--exclude
is passed (status quo). You need to explicitly pass both --selector
and --select
/--exclude
to achieve the combination.I'm going to mark this as help_wanted
for any community member who'd be interested in working on it :)
I think it would be good if the following could be done
I believe this would be backwards compatible but also mostly future complete as in it provides intuitive ways to accomplish whatever selection is desired through a combination of predefined and command line options.
If the command line syntax worked identically in the yml, then it could also simplify the documentation and things would just work the way the end user expects them to work.
Also, happy to help on this. Curious if the select/exclude is represented in the same way internally as each selector?
https://github.com/dbt-labs/dbt-core/pull/7101
Starting to work out how this could be done in the PR above, have only partially tested
I believe that this is ready for review through #7101
I did things differently than what @jtcohen6 suggested.
Basically, --select / --exclude create a new selector named "arg_selector" which can be referenced in selectors.yml. The select / exclude syntax also supports selector:
Realize it is different, but I think the change in the PR basically makes the change seem seemless.
Also, unified the syntax between the yaml selectors and the command line (introducing EXCEPT).
dbt --select "selector:selector_one EXCEPT some_model+"
is the same as
dbt --select selector:selector_one --exclude some_model+
In addition, the selector yaml syntax allows selector:... and the union/intersect,EXCEPT logic.
definition: selector:selector_one EXCEPT some_model+
@acurtis-evi This would work for the use case I described. Did I understand correctly that both
dbt run --select selector:marketing --exclude selector:ecommerce
, and dbt run --select "selector:marketing EXCEPT selector:ecommerce"
would work and are interchangeable after the change?
I also have a use case where I'd like to use selectors in both --select and --exclude arguments. Example where I run all the marketing models but exclude any ecommerce models in the marketing DAG:
dbt run --select selector:marketing --exclude selector:ecommerce
Yes, both options would work. In addition to this, the yaml selector syntax is unified with the command line syntax. The old yaml syntax is supported, but the plain description can simply be a complex string with parent, child relations, unions, intersections, and EXCEPT.
selectors:
name: some_selector
description: selector:marketing EXCEPT selector:ecommerce
and then you can do the dbt run using
dbt run --selector some_selector
I also introduced an arg_selector
dbt run --select selector:marketing --exclude selector:ecommerce
AND
dbt run --selector arg_selector --select selector:marketing --exclude selector:ecommerce
are identical and the arg_selector can be used in the yaml selectors like this
selectors:
name: some_selected_selector
description: selector:some_selector,selector:arg_selector
name: some_selector
description: selector:marketing EXCEPT selector:ecommerce
Now you can use that selector like
dbt run --selector some_selected_selector --select some_models+
and it will intersect selector:marketing EXCEPT selector:ecommerce
and some_models+
The selector:some_selector allows for modifiers as well such as +1
dbt run --select selector:marketing+1 --exclude selector:ecommerce
@acurtis-evi thanks for the PR! This feature would definitely be helpful for a use-case that we have. Is there any new info on this issue?
# ENV
dbt-snowflake==1.7.1
Found this issue while looking for more info about the selector argument - here's an interesting edge case I have below:
$my_project_name
to only check for the nodes of $my_project_name
by default
dbt run
and it will only run the models of $my_project_name
marts
dbt_project_evaluator
also has a layer of models called marts
Finally, when I'm running the following commands I don't have the same result and behaviour I would have excpected
dbt ls --select models/marts
$my_project_name
and dbt_project_evaluator
)
dbt ls --selector $my_project_name --select models/marts
$my_project_name
and not using the --select
argumentExpected result from both above commands: returns only marts models from $my_project_name
I also think a combination of the arguments --select
/--exclude
with--selector
should be implemented, there is a high risk users are not using this correctly and can wrongly select nodes from their project and installed packages
I'd love to see the ability to combine selectors + command-line selection. My use case is that I'm trying to define super-sets of models I want to run so I have a consistent definition - for example, "things I want to run hourly". Then I want to intersect that with a selector per job, for example, "data_product_x and its parents and children".
Defining additional selectors for each data product feels like overkill to me. Ideally I could run this with something like:
dbt ls --select selector:hourly,+data_product_x+
Any chance we might see at least the selector:
method added? This issue's been around for a while as has the PR.
@djbelknapaw: the way we're handling this right now is via YAML anchors. It's not as user friendly, but at least it works well for now.
Example:
- name: data_product_x
definition:
union: &data_product_x
- 'model1'
- 'model2'
- '...'
- name: data_product_x_hourly
definition:
intersection:
- union: *data_product_x
- tag: hourly
This was a bit of surprise to find as an issue today. Would really appreciate if this could be resolved. We are using anchors as suggested, but its not clean. It adds overhead.
Further, it'd be great if these all didn't have to be in a single selectors.yml
file, and could be broken out into separate files like schema.yml files can for maintainability.
Agreed, multiple files would be great! (although that's probably a separate feature request)
Our current selector.yml
is over 1000 lines....
It should be fairly simple to merge the selectors from multiple files into one single list in python (as long as all selectors are unique, otherwise raise an exception). 🤷♂️
I think the most explicit and powerful way to blend CLI selection with YAML selectors would be the introduction of a new selector method in order to explicitly select a selector (😮💨).
For example, instead of dbt test --selector staging --exclude stg_client
(from the OP), the syntax could be dbt test -s selector:staging --exclude stg_client
, which would make the intention very explicit (i.e. select everything from the selector, and exclude stg_client
).
Bonus: this would also support the entire power of CLI selection, such as unions & intersections (for example: dbt test --selector:staging,tag:critical selector:staging,tag:important
, or even better: dbt test --selector:staging,selector:critical_and_important
), and it would thereby address #10991 & #10596 (and probably others).
In fact, selector inheritance already exists (see also #1628, #4821, and #4827), which defines a selector
method. But unfortunately, this method is not available on the command line (only in selectors.yml
). 😞
Actually, this felt significant enough that I created a new feature request: #10992 (also since this one is pretty old and seems dead in the water).
Is there an existing issue for this?
Current Behavior
I try to test my models like this:
However, tests that apply to
stg_client
(a staging model) are still executed.Expected Behavior
I expected the tests that relate to the model
stg_client
to be skipped, since I specified--exclude stg_client
.Steps To Reproduce
--exclude
to exclude a single model from testing.Relevant log output
No response
Environment
What database are you using dbt with?
bigquery
Additional Context
No response