Open johnson-jay-l opened 1 month ago
Here's what I've tried:
test_table.sql
{{
config(
materialized='table'
)
}}
select 1 as col1
union
select 2 as col2
test_table_clone.sql
{{
config(
materialized='clone',
this_relation='test_table',
defer_relation='test_table',
schema='bi'
)
}}
dbt run --target databricks-dev-local --select test_table
04:36:00 ========== Begin Summary ==========
04:36:00 node: model.calm_main.test_table; status: success (error: )
04:36:00 ========== End Summary ==========
04:36:00 1 of 1 START hook: calm_main.on-run-end.0 ...................................... [RUN]
04:36:00 1 of 1 OK hook: calm_main.on-run-end.0 ......................................... [OK in 0.00s]
04:36:00
04:36:01
04:36:01 Finished running 1 table model, 1 project hook in 0 hours 0 minutes and 30.69 seconds (30.69s).
04:36:03
04:36:03 Completed successfully
04:36:03
04:36:03 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
dbt run --target databricks-dev-local --select test_table_clone
04:36:25 Found 1254 models, 40 snapshots, 23 seeds, 1 operation, 2795 data tests, 1773 sources, 1050 macros
04:36:25
04:36:34 Concurrency: 4 threads (target='databricks-dev-local')
04:36:34
04:36:34 1 of 1 START sql clone model sandbox.test_table_clone .................. [RUN]
04:36:34 No relation found in state manifest for model.calm_main.test_table_clone
04:36:34 1 of 1 ERROR creating sql clone model sandbox.test_table_clone ......... [ERROR in 0.04s]
Question - why is it looking for the state manifest? Is there a way to ignore the state manifest and create the clone as long as the upstream task was successful? We often have downstream models running in separate container environments that don't have awareness of the upstream models' state manifests because the success/failure state is managed outside of dbt.
I'm reasonably certain that we don't actually support clone as a materialization, but as support for dbt clone
In theory it seems like a custom materialization could work that uses cloning. I got pretty far with this example below based on table.sql
and then calling clone()
from a model.sql file instead of using the standard config
block.
Would this general direction potentially work as a model materialization type? Or, what would it take to make the clone.sql logic work with models and not just the CLI? Looking for guidance on which direction makes the most sense.
{% macro clone(source_model_name, source_schema, schema, tags, pre_hook, post_hook) -%}
{%- set source_model_name = source_model_name -%}
{%- set source_schema = source_schema -%}
{%- set schema = schema -%}
{%- set tags = tags | default([]) -%}
{%- set pre_hook = pre_hook | default('') -%}
{%- set post_hook = post_hook | default('') -%}
{{
config(
materialized = 'databricks_clone',
source_model_name = source_model_name,
source_schema = source_schema,
schema = schema,
tags = tags,
pre_hook = pre_hook,
post_hook = post_hook
)
}}
-- depends_on: {{ ref(source_model_name) }}
{%- endmacro %}
{% materialization databricks_clone, adapter = 'databricks' %}
{{ log("MATERIALIZING CLONE") }}
{%- set source_model_name = config.get('source_model_name') -%}
{%- set source_schema = config.get('source_schema') -%}
{%- set target_schema = schema -%}
{%- set grant_config = config.get('grants') -%}
{%- set tags = config.get('databricks_tags') -%}
{{ log("database: " ~ database, info=True) }}
{{ log("source_model_name: " ~ source_model_name, info=True) }}
{{ log("source_schema: " ~ source_schema, info=True) }}
{{ log("target_schema: " ~ schema, info=True) }}
{%- set target_relation = adapter.get_relation(database=database, schema=target_schema, identifier=source_model_name, needs_information=True) -%}
{{ log("target_relation: " ~ target_relation, info=True) }}
{{ log(target_relation, info=True) }}
{%- set source_relation = adapter.get_relation(database=database, schema=source_schema, identifier=source_model_name, needs_information=True) -%}
{{ log("source_relation: " ~ source_relation, info=True) }}
{{ run_hooks(pre_hooks) }}
-- build model
{%- call statement('main', language='sql') -%}
create or replace
table {{ target_relation }}
shallow clone {{ source_relation }}
{%- endcall -%}
{% set should_revoke = should_revoke(old_relation, full_refresh_mode=True) %}
{% do apply_grants(target_relation, grant_config, should_revoke) %}
{%- do apply_tags(target_relation, tags) -%}
{% do persist_docs(target_relation, model, for_relation=language=='python') %}
{% do persist_constraints(target_relation, model) %}
{% do optimize(target_relation) %}
{{ run_hooks(post_hooks) }}
{{ return({'relations': [target_relation]})}}
{% endmaterialization %}
First we'd have to remove the blockers to custom materializations (another open ticket) :/. I suggest raising the issue of clone as a supported materialization with dbt Labs, because your use case seems reasonable to me.
I'm new to custom materializations and was going off this doc which implies that custom materializations are supported: https://docs.getdbt.com/guides/create-new-materializations?step=1
Could you elaborate on the blockers? Happy to open another ticket and/or contribute to the code but I need to better understand what I should be asking for vs what is already supported.
I went ahead and created the dbt-core issue and linked it to this one
There are places in dbt-databricks that explicitly enumerate supported materializations that we need to remove.
Describe the bug
I would like to try using the materialization type of
clone
so that I can have a model that "publishes" tested data runs downstream from the actual model being tested. The clone should have the same table name as the upstream model but should be in a different database schema.This materialization type is not documented and I can't figure out how to get it to work for my use case.
Code for the
clone
materialization: https://github.com/databricks/dbt-databricks/blob/main/dbt/include/databricks/macros/materializations/clone.sqlSteps To Reproduce
clone
materialization typeExpected behavior
Screenshots and log output
n/a
System information
The output of
dbt --version
:The operating system you're using:
The output of
python --version
:Additional context