Open DVAlexHiggs opened 2 years ago
I think the problem isn't so much that meta isn't available at compile time, it's that the config from properties.yml isn't available when the ref is processed. A dbt project is parsed in the order: 1) macros, 2) models, 3) schema files. When the model is "parsed", we extract the ref, source, and config calls and use them to set information in the nodes. It doesn't even help to put the meta config in the model, because that data is extracted at the same time as the ref. The model that is ref'd must resolve to a string at parse time, or we cannot construct the DAG properly.
We do have discussions on a regular basis about whether there are advantages to tweaking the order that things are processed, or about moving some flavors of setting to a different place. But these are all pretty complicated and delicate things to change since they are so central to the way that dbt works.
Let us know if you have other ideas about how to make this easier to work with.
Apologies, accidentally closed.
Thanks for your response. Back in the old days we could scope variables specifically to models, and these variables were provided in the dbt_project.yml
and used the directory structure to assign the variables to a specific model.
This was good but it quickly became hard to manage as the dbt_project.yml
became larger and larger for even the smallest projects.
We should not revert to that, but I'm wondering if model-scoped variables and being able to provide these in properties.yml
would provide the same cabability, whilst also being vastly superior in terms of scaling; including all of the advantages of properties.yml
such as arbitrary nesting etc, allowing a user to split the variables across multiple files, example:
# properties.yml
version: 2
models:
- name: my_table
variables:
param_1: COLUMN_1
param_2: COLUMN_2
param_3: COLUMN_3
param_4: COLUMN_4
source_model_name: my_stage
-- my_table.sql
{{ config(materialized='table') }}
SELECT 1 AS "{{ var('param_1') }}"
, 2 AS "{{ var('param_1') }}"
, 3 AS "{{ var('param_1') }}"
, 4 AS "{{ var('param_1') }}"
FROM {{ ref(var('source_model_name')) }}
Perhaps then, adding variable processing to the same level as macros or before or after, but before models, would be a minimal solution to this and not have undesirable impact on the fundamental operation of dbt as a whole?
Really eager to find a solution to this, as it will have huge implications, as described in my initial proposal.
EDIT:
I've thought about this a little more and the downside I can see with this is that it's perhaps not intuitive, as these parameters are fundamentally for configuring the models, rather than variables per-say.
@DVAlexHiggs does replacing e.g.:
{% set source_model_name = config.get('meta')['source_model_name'] -%}
with
{% set source_model_name = model.config.meta.source_model_name -%}
work for you as of now? Otherwise can you provide a minimal working example of a project, and I could check if your issue is the same I ran across.
Edit: This is actually adviced against here. The workaround would be to use the execute flag as adviced, but that requires you to add -- depends_on: ref(...
back to the referenced model, which is exactly what you were hoping to avoid.
Hi @scajanus. Yes advice against this is the reason I had not explored that option.
I can confirm it does not work reliably, due to the reasons mentioned in the docs you linked (incomplete graph).
There must be a way we can get this feature implemented in a dbt-esque way
Putting the meta in the model's config block also doesn't seem to work as desired:
https://github.com/Datavault-UK/dbtvault/issues/106#issuecomment-1078984150
@gshank Odd that using a config
block in the model doesn't work either, is the config processed after the model? That doesn't seem to make sense to me
@DVAlexHiggs my understanding is that none of the config is treated as final until the full graph is parsed, i.e. there is no (pre)parser that would check if a part of the config is already safe to use -- as in the model case.
The easiest way forward would seem to be allowing defining globals (or template/model locals) that cannot be changed by the graph/template parsing. Jinja has The Global Namespace and Template Globals
Otherwise you would need to define the order in which each of the tags are expanded and do multiple passes of parsing, which seems nasty; as you would need to also check if config is mutated and determine if that is allowed/what should happen then? If I've understood the discussion here correctly, that is :)
Either your config is fixed before parsing the graph (as it used to be?), or you allow the graph to change the config. If you allow the graph to change the config, you cannot use it to alter the way the graph is parsed (and expect consistent results). So if you want to use some variables during the parsing, they need to be immutable by the parsing, as the current variables are. We seem to be looking for a way to have variables/config that is model-specific but immutable.
I went through some of the related issues/comments from before (https://github.com/dbt-labs/dbt-core/issues/2377#issuecomment-621906565, https://github.com/dbt-labs/dbt-core/issues/2401, https://github.com/dbt-labs/dbt-core/issues/1790) but none of those change the fact that variables are package global and config can be changed by the graph. They just allow new locations for setting them. Subsequently the feature requested here should not apply for config, subset of config ('meta') or variables, but be its own entity.
@scajanus Thank you for looking into this! This makes sense and helps me to understand why it's not a simple feature to implement. Any chance we can explore this further @gshank ?
I've found a semi-working solution for this, though it's a bit messy due to the need for if execute
and -- depends on
.
Primarily, the need for if execute
severely limits this approach, as users would not be able to make use of only compiling, and not running, as well as other useful dbt features which don't rely on things using dbt run
.
# properties.yml
version: 2
models:
- name: my_meta_table
meta:
param_1: COLUMN_1
param_2: COLUMN_2
param_3: COLUMN_3
param_4: COLUMN_4
source_model_name: v_stg_orders
-- my_meta_table.sql
{{ config(materialized='table') }}
{% do log('meta: ' ~ config.get('meta'), true) %}
{% do log('param_1: ' ~ config.get('meta')['param_1'], true) %}
{% do log('param_2: ' ~ config.get('meta')['param_2'], true) %}
{% do log('param_3: ' ~ config.get('meta')['param_3'], true) %}
{% do log('param_4: ' ~ config.get('meta')['param_4'], true) %}
{% do log('source_model_name: ' ~ config.get('meta')['source_model_name'], true) %}
{% set param_1 = config.get('meta')['param_1'] -%}
{% set param_2 = config.get('meta')['param_2'] -%}
{% set param_3 = config.get('meta')['param_3'] -%}
{% set param_4 = config.get('meta')['param_4'] -%}
{% set source_model_name = config.get('meta')['source_model_name'] -%}
-- depends_on: {{ ref('v_stg_orders') }}
{% if execute %}
SELECT 1 AS "{{ param_1 }}"
, 2 AS "{{ param_2 }}"
, 3 AS "{{ param_3 }}"
, 4 AS "{{ param_4 }}"
FROM {{ ref(source_model_name) }}
{% endif %}
Logs:
08:56:46 Running with dbt=1.0.6
08:56:46 meta:
08:56:46 param_1:
08:56:46 param_2:
08:56:46 param_3:
08:56:46 param_4:
08:56:46 source_model_name:
08:56:46 Found 32 models, 0 tests, 0 snapshots, 0 analyses, 497 macros, 0 operations, 0 seed files, 8 sources, 0 exposures, 0 metrics
08:56:46
08:56:48 Concurrency: 4 threads (target='dev')
08:56:48
08:56:48 1 of 1 START table model ALEX_HIGGS.my_meta_table............................... [RUN]
08:56:48 meta: {'param_1': 'COLUMN_1', 'param_2': 'COLUMN_2', 'param_3': 'COLUMN_3', 'param_4': 'COLUMN_4', 'source_model_name': 'v_stg_orders'}
08:56:48 param_1: COLUMN_1
08:56:48 param_2: COLUMN_2
08:56:48 param_3: COLUMN_3
08:56:48 param_4: COLUMN_4
08:56:48 source_model_name: v_stg_orders
08:56:52 1 of 1 OK created table model ALEX_HIGGS.my_meta_table.......................... [SUCCESS 1 in 3.66s]
08:56:52
08:56:52 Finished running 1 table model in 5.63s.
08:56:52
08:56:52 Completed successfully
08:56:52
08:56:52 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
Compiled SQL:
-- depends_on: SANDBOX.ALEX_HIGGS.v_stg_orders
SELECT 1 AS "COLUMN_1"
, 2 AS "COLUMN_2"
, 3 AS "COLUMN_3"
, 4 AS "COLUMN_4"
FROM SANDBOX.ALEX_HIGGS.v_stg_orders
I can only second this feature request.
We are looking for a solution which would allow us to reuse as much as possible from existing marts' models for new marts that are schematically identical but which only differ by some filter or source model names. Thus, we seen to inject the mart's name into the models and into the ref() functions therein.
However, extracting the schema name of the current model is harder than I had expected:
I am mentioning this use case and the not-working solutions at this point, as I feel that the proposed solutions above fail for the same reason.
On a side note, I also want to mention that folder-level variables might serve some of the abovementioned use cases, as long as they would be available at compile time: https://github.com/dbt-labs/dbt-core/issues/4938#issuecomment-1191121432
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
This is too useful to be closed due to staleness...
Yes this should remain open
I'm confused as to why schema files are parsed after the model.
Furthermore, I can print the variable, just not use it. Which also feels odd.
This would be a powerful addition to dbt, and be highly valuable. ⬆
Voting for this as well
Voting for this
voting for this
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.
Is there an existing feature request for this?
Describe the Feature
It would be great if the
config.meta
property was populated at compile time. This would allow users to use themeta
dictionary to generate SQL for their dbt models at scale using the meta dictionary as parameters. For highly parameterised SQL via macros, such as in dbtvault, the following approaches are currently possible:fromyaml
) exampledbt_project.yml
All three of these approaches work, but aren't particularly scalable or easy to maintain.
Some downsides of the above approaches:
fromyaml
is sensitive to formatting and is harder to debug because it's a YAML string rather than a YAML filedbt_project.yml
file can quickly grow and become hard or even impossible to manage.I requested a similar feature here and I believe allowing
config
andmeta
to be declared in property (.yml
) files was one of the answers to this idea originally. Whilst this is a fantastic feature and has ensured dbt has come on leaps and bounds with metadata management, I believe it's not quite there yet.I recently tried using
meta
to replace the previously described approaches (fromyaml, params in model, global variables) and unfortunately came across the issue that we couldn't access the data at compile time. The snippets below demonstrate this:Expected/Desired outcome
Current outcome (dbt 1.0.3)
I realise the original intent was probably that metadata would be static data to have alongside models, similar to tags and documentation, however there is a definite use case for it to drive SQL generation, as well.
Thank you!
Describe alternatives you've considered
fromyaml
) exampledbt_project.yml
As descried above.
Who will this benefit?
Anything else?