[CT-1952] [Feature] Provide additional Jinja tests on top of the built-in ones

b-per commented 1 year ago

Is this your first time submitting a feature request?

[X] I have read the expectations for open source contributors
[X] I have searched the existing issues, and I could not find an existing issue for this feature
[X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Jinja comes with built-in tests, but it is also possible to create custom ones.

When inspecting objects with Jinja in dbt a few extra tests could be useful to make the code shorter and more readable.

The following code inspects the graph to search for tests that depend on at lease one node.

{{ graph.nodes.values() 
    | selectattr("config.materialized","==","test") 
    | selectattr("depends_on.nodes") 
    | list | tojson(indent=1)
}}

If I want to search for tests applying to a given node I then need to loop through the results. If we had a test called contains, we could do something like

{{ graph.nodes.values() 
    | selectattr("config.materialized","==","test") 
    | selectattr("depends_on.nodes","contains","model.project.my_model") 
    | list | tojson(indent=1)
}}

which would return the list of tests that are applied to my_model.

The list of tests I can think of for now would be:

contains: checking if a list contains a given string
contains_substring: checking if a string contains a substring
matches: checking if a string matches a given regex
startswith: checking if a string starts by a specific substring
endswith: checking if a string ends by a specific substring

Describe alternatives you've considered

We can already achieve all the outcome by writing longer and more complex Jinja code. This feature would make it

easier to understand what some Jinja code does
and easier to write it in the first place

Who will this benefit?

People writing Jinja macros as part of packages or custom logic

Are you interested in contributing this feature?

Yes

Anything else?

No response

dbeatty10 commented 1 year ago

The tests you described sound handy @b-per.

I haven't tried the built-in test named in. Does it behave the same way as your proposed contains?

b-per commented 1 year ago

It is actually the opposite.

a in b allows us to test that a is part of the list b
a contains b then allows us to test that b is part of the list a

selectattr() requires us putting the Jinja selector as the first argument so we actually can't use in for the case 2nd case

dbeatty10 commented 1 year ago

👍 Thanks for explaining @b-per.

Who would you see as the primary users of these new Jinja tests? Analytics engineers in their own dbt projects? Or would these be more for dbt package maintainers or dbt-core developers? I'm guessing this came up in the context of dbt-project-evaluator?

As an aside, I wonder how much conceptual overlap there is here with GPML? GPML is the graph pattern matching sub-language by WC3 that is the core of both SQL/PGQ and GQL which I think are scheduled to be published this year as part of SQL:2023 (Part 16 of ISO/IEC 9075).

b-per commented 1 year ago

My need mostly came from the want to analyze the graph object from the IDE to get more familiar with a project I didn't know. For example, today, the graph is the easiest way to know what is the materialization of a model for example without looking at whether the materialization has been defined in the model itself, or in the YML for the model, or at some level in dbt_project.yml.

Saying so, I believe that if it was available it would:

make it easier to write packages introspecting the dbt Jinja objects
make it easier for people to write on-run-start/end hooks
potentially allow for more explicit code in dbt-core (I would need to look at some code to see if there is some refactoring opportunity)

I am not too sure about the conceptual overlap. This looks to be more applicable to data and graph db when this issue feels more about a pure Jinja topic.

dbeatty10 commented 1 year ago

I am not too sure about the conceptual overlap. This looks to be more applicable to data and graph db when this issue feels more about a pure Jinja topic.

In terms of conceptual overlap, I just meant to highlight that it looks like you are trying to query graph using Jinja as the programming language Jinja.

And I wanted to call out that it sounds like GPML is a graph pattern matching sub-language that might be a published standard sometime this year. Obviously it won't be something we can use in the near-term, but might be an option for similar use cases in the long term.

dbeatty10 commented 1 year ago

@jtcohen6 could you give your thoughts about the proposal of adding the following Jinja custom tests to those that are built-in?

contains: checks if a list contains a given string
contains_substring: checks if a string contains a substring
matches: checks if a string matches a given regex
startswith: checks if a string starts with a specific substring
endswith: checks if a string ends with a specific substring

See below for a quick summary of pros/cons.

Pros

The implementations might be quite simple
The proposed Jinja tests can be used as building blocks to:
- make it easier to write packages that introspect the dbt Jinja objects
- potentially allow for more explicit code in dbt-core
- make it easier for people to write on-run-start/end hooks

Cons

Although the proposed Jinja tests can be used as building blocks to discover the materialization of a model within an unfamiliar project, it still doesn't feel like an easy way to do so
More surface area of code to maintain within dbt-core
It is currently possible to write all the relevant logic in Jinja (albeit more clunky than the proposed methods)

github-actions[bot] commented 8 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

b-per commented 8 months ago

Commenting because I am still keen to see this implemented!

dbt-labs / dbt-core