Closed jpeak-intellify closed 3 years ago
@jpeak-intellify Thanks for opening! It's a really solid thought, and I hope we can kick off some cool discussion.
As we continue to solidify the foundations of dbt-core
, I'm also interested in making it much more pluggable. We already have some thinking in this vein, ranging from the quite broad to the pretty specific:
So, why not the Jinja context, too?
I think there are two different ways to plug into / extend the syntax and functionality available in the dbt-Jinja templating context:
do
pytz
, datetime
, re
Are you thinking more about turning on extensions that are installed along with jinja2
, or installing custom extensions? If the extensions are all built-ins, I'm thinking that the right "on/off" switch could even be an environment variable (!), rather than another python program/plugin. If you're talking about custom user-defined extensions, though, the need for a python plugin makes sense.
My concerns:
dbt-core
is using jinja2==2.11.3
; the latest version available is 3.0.1
. I can't guarantee that we're going to eagerly upgrade to new Jinja versions. We expose a lot of Jinja syntax to users, and we'd risk breaking a lot of user-space code.os
or requests
, or custom Jinja extensions to perform system/networking functionality. That's not a reason not to make changes in dbt-core, but it's definitely something I'm mindful of—as we think about the products we want to build into the future, our aim is to keep dbt-core support (and more broadly, dbt language support) as consistent and deployment-agnostic as possible.I am trying to automate overriding the
generate_database_name
andgenerate_schema_name
macros but even if I install a dbt-package with my custom macro logic I still need to manually shim these macros in each repo.
This is a more specific to dbt functionality—I totally hear you, and I think there's a pretty neat way we can enable this!
There's a detailed proposal (https://github.com/dbt-labs/dbt/issues/3456) to specify dbt
as the macro_namespace
of all globally defined macros. This would allow you to do things like set this one-time config in each project:
# dbt_project.yml
dispatch:
- macro_namespace: dbt
search_order: ['current_root_project', 'my_package_with_custom_macro_logic', 'dbt']
Whenever dbt goes to resolve a "global" macro, it would look for candidate implementations (prefixed adapter__
or default__
) first in current_root_project
, then in my_package_with_custom_macro_logic
, then (finally) in the dbt
repo.
Your issue prompted me to actually give this a go, which I did over in https://github.com/dbt-labs/dbt/pull/3851. I think it might... just... work :)
Now, to work for a specific global macro, that macro needs to be dispatched, and the generate_x_name
macros aren't today. (Historically, we've tended to dispatch macros that require different syntax across database adapter plugins.) If we dispatch those, with the explicit macro_namespace='dbt'
, and your package reimplements default__generate_schema_name
, and you specify that config in your root project file, it should just work—it did when I tested locally just now.
Having said all that, I'm actually quite curious to hear about how you were planning to accomplish the above with Jinja extensions.
Thanks @jtcohen6 for the comprehensive context catchup.
It seems like overall we are on the same page about a larger scope of work about the extensibility of dbt-core which is why I referenced pytest as prior art. dbt (in my mind) sits in a similar space that codifies the backbone of a common process.
dbt seems to have solved a problem for 80-90% of people, but you've given the enterprises that have been datawarehousing for decades a teaser of a better future. A future where the gap between the sophistication of the tool and the complexity of the data modelling are not matched and in honesty that gap is a bit too enterprise specific.
Really awesome reading through the list of points of extension (that look like to hooks I was trying to find) that allow enterprises to do their too specific quirks.
My specific problem is:
How do I codify our custom naming conventions for databases and schemas? a reusable dbt-package How do I reliably get our macros preferred over the default global macros? 🤷
I think I naively went down the route thinking I am constrained to the Jinja2 environment and blindly suggested trying to gain access to modify the Jinja2 environment.
After looking at some Jinja extensions I am not confident it would achieve what I am after, and if it did, I'm not confident the person following me would be able to maintain that extension. So that was a bad idea in hindsight...
The following dispatch
& search_order
issue and PRs seem more in line with the specific problem I am trying to solve for.
PR: https://github.com/dbt-labs/dbt/pull/3851
This would allow us to refactor our database and schema naming conventions into a dbt package and evolve them independently. Our consumer dbt projects would know that our flavour of macros have higher precedence.
My only concern is that it is using adapter.dispatch(...)
, and it is now dispatching for more reasons than just adaptors. That can be a separate scope of work for that refactor-rename though.
My ideal outcome is that we have our own BigCorp flavoured dbt package dbt-bigcorp-macros
that we reference internally like:
packages.yml
packages:
- git: "git@github.com:bigcorp/dbt-bigcorp-macros.git"
revision: vX.Y.Z
Then it sounds like we would just create a dispatch namespace for them:
bigcorp-macros.sql
{% macro bigcorp__generate_schema_name(custom_schema_name, node) -%}
...
{% endmacro %}
{% macro bigcorp__generate_database_name(custom_database_name=none, node=none) -%}
...
{% endmacro %}
Then if I understand correctly, opting into custom ordering would add the following to dbt_project.yml
:
dbt_project.yml
dispatch:
- macro_namespace: dbt
search_order: ['my_root_project', 'bigcorp', 'dbt']
This would solve a whole class of problems for us, getting predictable behaviour when adopting our own flavour of conventions that deviate from the default dbt training modules.
I can pip install https://github.com/dbt-labs/dbt/archive/refs/heads/feat/allow-dispatch-global-macros.zip
the draft branch and POC our variant and provide feedback on the above issue and PR to move this forward.
How does that sound?
@jpeak-intellify Thanks for the detailed follow-up, and for engaging with the multiplicity of thoughts I threw your way.
I think you've understood the dispatch strategy quite well; I just want to clarify a few points. You're right that the primary historical purpose of adapter.dispatch
has been to handle cross-database functionality. The advent of dispatch
in v0.18 (replacing adapter_macro
) added a critical capability: dispatching across packages (macro namespaces). We initially intended this to solve a highly specific problem: extending dbt_utils
macros to arbitrarily many database adapters, without requiring a PR into the main package for each one. But the capability has really expanded and matured since then: You define a "routing" macro that users call. Then, statically at parse time (!), on the basis of your project-level dispatch
config and your current database adapter, dbt will figure out which macro they actually want to use.
So the syntax of dispatch
retains both parts of the equation:
dispatch
config that defines search_order
, allowing you to rank installed packages above other installed packages, or above the global projectredshift__
→ postgres__
→ default__
)The workflow we've been describing primarily depends on the first piece, but you'll still need to prefix your macros with an adapter signifier (almost certainly default__
). So it's exactly the code you shared above, except:
-- dbt_bigcorp_macros/macros.sql
{% macro default__generate_schema_name(custom_schema_name, node) -%}
...
{% endmacro %}
{% macro default__generate_database_name(custom_database_name=none, node=none) -%}
...
{% endmacro %}
The bigcorp__
prefix would make sense if you were using a custom dbt-bigcorp
database adapter. At the point where it's a preference for the candidate macros defined in the dbt-bigcorp-macros
package, that's the role of the project config's search_order
.
I get that the default__
prefix may feel unnecessary in this case. But imagine if dbt-bigcorp-macros
wanted to be usable by projects running on Apache Spark or Trino or UnknownAdapter, and do different things on each. Then you could define:
-- dbt_bigcorp_macros/macros.sql
{% macro spark__generate_schema_name(custom_schema_name, node) -%}
...
{% endmacro %}
{% macro trino__generate_schema_name(custom_schema_name, node) -%}
...
{% endmacro %}
{% macro default__generate_schema_name(custom_schema_name, node) -%}
...
{% endmacro %}
Following your search_order
, dbt would always prefer a macro in your package. It would pick dbt_bigcorp_macros.spark__generate_schema_name
on Spark, dbt_bigcorp_macros.trino__generate_schema_name
on Trino, and dbt_bigcorp_macros.default__generate_schema_name
when running anywhere else.
So, if that all still sounds okay to you, I'm going to:
feat/allow-dispatch-global-macros
branchThanks for the clarifications about the use of dispatched namespaces.
That makes sense that the macros are resolved statically at parse time.
Describe the feature
As a dbt user I should be able to install extensions through
pip
such that I can:pip install dbt-ext-myextension
ORpip install dbt-jinja-ext-myextension
And the
dbt
cli autoloads these extensions similar to pytest allowing control of the event hooks and the Jinja environment for custom behaviour.Describe alternatives you've considered
I am trying to automate overriding the
generate_database_name
andgenerate_schema_name
macros but even if I install a dbt-package with my custom macro logic I still need to manually shim these macros in each repo.Additional context
N/A
Who will this benefit?
Power users that have an architecture or naming conventions dictated by business users that has outrun what dbt Out-Of-Box (OOB) can achieve.
Ideally allowing the Jinja environment to be extended would be an end goal here. It looks like the
do
command is already being used fromjinja2.ext.do
. https://github.com/dbt-labs/dbt/blob/develop/core/dbt/clients/jinja.py#L489My ideal UX/DX would be:
dbt-jinja-ext-myextension
.dbt
then uses theimport_metalib.distributions()
functionality likepluggy
does for autodiscovery.This leverages prior art for plugin design. It is minimally invasive (leaving plugins for dbt event hooks as a separate scope of work). It leverages the Jinja Extension community as the scope of what can be done. Authoring a python extension better leverages python tooling to maintain that code than Jinja macros in a dbt package.
Are you interested in contributing this feature?
Happy to help contribute this feature. Wanted to discuss scope of work before starting work though to fit in the roadmap.
For reference I was a maintainer of
vcrpy
API testing library for 6 months in 2019.