Open MartinGuindon opened 3 years ago
@MartinGuindon Thanks for the detailed writeup!
I would have assumed that schema tests on snapshots would be unaffected by ref() override, just like schema tests on sources are unaffected.
The reason here is simple: you use the ref()
macro to reference snapshots, and the source()
macro to reference sources. You're overriding the ref()
macro but not the source()
macro.
I don't think we should make a dbt change here: if you construct your own ref()
macro on top of builtins.ref
, and then proceed call the ref()
macro, dbt shouldn't attempt to subvert your custom definition. I do think you have a few potential approaches here:
Define a custom ref
macro, just for referencing snapshots.
{% macro ref_snapshot(snapshot_name) %}
{% do return(builtins.ref(model_name)) %}
{% endmacro %}
Use `{{ ref_snapshot(snapshot_name) }}` in lieu of `{{ ref('snapshot_name) }} ` throughout your project.
2. Use a common naming convention for your snapshots, and add conditional logic to your custom `ref` function.
```sql
{% macro ref(model_name) %}
{% if model_name.startswith('snap_') %}
{% do return(builtins.ref(model_name)) %}
{% else %}
{% do return(builtins.ref(model_name).include(database=false)) %}
{% endif %}
{% endmacro %}
I'm going to close this, but I remain curious to hear what you think!
Hi @jtcohen6,
I'm confused. I'm not explicitly using the ref() function, as I'm not talking about a model leveraging a snapshot. If I were to use the ref() function in a model, pointing to a snapshot, I fully understand that it would be totally normal (and hence why snapshots tables are configured as sources prior to using them in models).
I'm talking about defining a schema test like not_null
within in the snapshot properties YAML file. Its the value that is returned to the schema test through the {{ model }}
argument that is incorrect. I imagine that its using the ref() function under the hood even though its not apparent in the schema test.
So option 1 is not possible. Option 2 works, I just tested it. However, I feel like this is a workaround and not a long term solution. Shouldn't schema tests set to a snapshot render properly to the configured database/schema, since those are not dynamic unlike the models?
Wow! I'm sorry, I completely skipped over the fact that this is about schema tests. Let me re-open and think about this some more.
@MartinGuindon Apologies again for misunderstanding your original issue. This is... tricky!
I admit it feels a bit weird that overriding the builtin ref
macro changes the behavior of how all non-source relations are rendered, specifically {{ model }}
in schema test definitions. I do think it's the intended behavior, however. We could either:
snapshots
, similar to what we have for sources
. We create relations differently for a resource, depending on whether it's a source vs. any other node type.ref
macro to handle this conditional, along the lines of option #2 above. This is what feels most right to me: overriding a builtin is already a high-difficulty move, and there should be knobs to turn in user-land that give the dbt developer the ability to tweak this behavior.In most projects, snapshots live in one or more stable, set-aside databases. Is that true in your case as well? Given that, here's the best answer I've got right now:
{% macro ref(model_name) %}
{% set rel = builtins.ref(model_name) %}
{% do log(rel.values(), info = true) %}
{% if rel.database == 'snapshots' %}
{% do return(builtins.ref(model_name)) %}
{% else %}
{% do return(builtins.ref(model_name).include(database=false)) %}
{% endif %}
{% endmacro %}
This worked when I tested it locally, running schema tests against snapshots configured with target_database = 'snapshots'
.
What I'd love to do here is make the conditional check {% if rel.resource_type == 'snapshot' %}
. I don't believe that's possible today; although it directly corresponds to resources like models and snapshots, the dbt Relation object itself (which ref()
returns) only contains information about database representations. It feels plausible, though. I'm not sure how tricky it would be to implement.
@jtcohen6 I'd think that a special handling for snapshots would be ideal, but in the mean time using {% if rel.database == 'snapshots' %}
works for me, since we're on Snowflake, we do have a dedicated database for snapshots.
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.
Describe the bug
When overriding the ref() macro to render identifiers without a database (as documented here), schema tests defined for snapshots in a snapshot properties file are rendered without a database.
As snapshots are "static" based on the snapshot configuration and not environment-based, I'm a bit surprised about this behavior. I would have assumed that schema tests on snapshots would be unaffected by ref() override, just like schema tests on sources are unaffected.
I therefore assume that this is a bug.
Steps To Reproduce
{% macro ref(model_name) %}
{% do return(builtins.ref(model_name).include(database=false)) %}
{% endmacro %}
version: 2
snapshots:
Expected behavior
Schema tests for snapshots should ignore the ref() function configuration and simply point to the configured database/schema of the specific snapshot.
System information
Which database are you using dbt with?
The output of
dbt --version
:The operating system you're using: MacOS 10.15.7
The output of
python --version
: Python 3.7.6