dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.36k stars 1.56k forks source link

[Feature] Support `dispatch` for materializations to use implementations defined in installed packages #10090

Open jtcohen6 opened 2 months ago

jtcohen6 commented 2 months ago

Is this your first time submitting a feature request?

Describe the feature

This is a way of preserving a more granular version of the previous behavior, which would implicitly (and somewhat surprisingly) use materializations from packages that override builtins (view, table, incremental, test, etc).

We are opting for this syntax:

dispatch:
  - type: materialization
    # macro_namespace is not needed, because all materializations are global
    search_order: ['elementary', 'dbt']
  - type: macro
    macro_namespace: dbt_utils
    search_order: [...]

Considerations

If not specified, the default would remain:

  1. Root package first
  2. Then builtin implementations within dbt-core / adapters

Which is how it works for materializations with the flag, and for dispatched macros.

Implementation

The relevant method is find_materialization_macro_by_name.

Michelle spiked a similar capability, though by pulling the "allowlist" from Project.flags rather than Project.dispatch: https://github.com/dbt-labs/dbt-core/pull/9956/commits/99998a597c75d71f85b85202e31de6cff984249e

Describe alternatives you've considered

(1) Not doing this

Users must reimplement materializations one-by-one, by defining them in their root project and calling the implementations in packages.

{% materialization table, snowflake %}
  {{ return(elementary.materialization_table_snowflake()) }}
{% endmaterialization %}

{% materialization incremental, snowflake %}
  {{ return(elementary.materialization_incremental_snowflake()) }}
{% endmaterialization %}

-- etc
-- differs by adapter

(2) Bundling with existing macro dispatch

Under the hood, these materializations are macros, and defined within the 'dbt' namespace. But I like the idea of continuing to keep the two separate, for three reasons:

  1. Avoid tying ourselves forever to this implementation details (that materializations are macros)
  2. Materializations are only callable from the "global" namespace, whereas all other macros can be called by namespace
  3. It's conceivable that users would want different behavior for materializations versus other built-in macros

(3) Different syntax that wouldn't require evolving the type of dispatch config

# 'materialization' is just a special macro_namespace
dispatch:
  - macro_namespace: materialization
    search_order: ['elementary', 'dbt']

Edge case: this would not play well with an installed package named materialization.

Who will this benefit?

Are you interested in contributing this feature?

Yes, with the help of the elementarians

Anything else?

We should do this for dbt Core v1.8.x only. No backports. While we introduced the deprecation warning in dbt-core v1.6 + v1.7, it's still the default behavior in those versions to use implementations defined in packages.

We will document in the v1.8 upgrade guide and "legacy behaviors" the recommended sequence of:

dbeatty10 commented 2 months ago

Overall, either of these options seem like they could work.

I'm more drawn to the second option because it seems it seems closest conceptually to current.

The first option has a slight hack feel to it along with the edge case where it would not play well with an installed package named materialization.

If we choose the second option, would users be able to mix-n-match like the following? Or would - type become required?

# second option: new attribute
dispatch:

  - type: materialization
    # macro_namespace is not needed, because all materializations are global
    search_order: ['elementary', 'dbt']

  - macro_namespace: dbt_utils
    # type is not needed here because `macro` is the default when not specified
    search_order: ['my_project', 'dbt_utils']

Out of curiosity, would/could either of these work from a technical perspective if we choose the second option?

# second option: new attribute
dispatch:
  - macro_namespace: dbt
    type: materialization
    search_order: ['elementary', 'dbt']
# second option: new attribute
dispatch:
  - macro_namespace: none
    type: materialization
    search_order: ['elementary', 'dbt']
jtcohen6 commented 2 months ago

@dbeatty10 I agree with your preference for a new type attribute.

I think users could "mix-n-match". The default type would be macro, and we already have custom validation logic where we could apply special handling: "If type: materialization and macro_namespace is not specified, it is none or 'dbt' by default."

And I like where your "out of curiosity" thought is going. This syntax would be extensible to eventually support namespacing for materializations, and finer-grained search orders for each namespace. That's a good future possibility, out of scope for whatever we're doing here. For now, this will be a single search order for all materializations.

jtcohen6 commented 2 months ago

@Maayan-s @haritamar Does this proposal make sense to you both? Would you still be up for helping with the contribution?

haritamar commented 2 months ago

Hi @jtcohen6 , yes this proposal sounds good on our end, and we'll be happy to help with contribution!