Closed DVAlexHiggs closed 4 years ago
hey @DVAlexHiggs - cool idea! Unfortunately, I don't think this is something we're going to add support for.
If you check out #2312 , you can see that we very recently made a ton of changes around variable scoping. Historically, variables were scoped at the folder-level. This meant that you could do something like:
models:
my_project:
path1:
vars: {myvar: abc123}
path2:
vars: {myvar: def456}
In this setup, a model in models/path1
would get a variable called myvar
with the value abc123
whereas a model in models/path2
would get a value of def456
. This made it really tricky to correctly build compilation contexts for different node types. Imagine you have a schema.yml
file in models/path2
which describes a model in models/path1
-- what do you think the myvar
value should evaluate to in that context?
To resolve inconsistencies like this, we've added a vars:
block to the dbt_project.yml
file which cannot be scoped to a particular path. It looks like:
vars:
myvar: abc123
models:
...
Note though that variables can still be scoped to packages:
vars:
package1:
myvar: abc123
package2:
myvar: abc123
I just wanted to give some background on how we're thinking about variables, and some of the recent changes that impact variable usage. Given these changes, I don't think we'll be able to add model-scoped variables like you've described here. I do however think your use case is a good one, and I can imagine two paths forward:
schema.yml
filesThis would look like:
models:
- name: my_model
config:
key1: <value1>
key2: <value1>
We could then provide a way to tap into these configuration values in the model .sql file context, eg:
select {{ config.get('key1') }} from ....
dbt_project.yml
fileSee #1790 for more info
Curious to hear what you think about these changes and options
Hey @drewbanin thanks for your response. I think both of these solutions have their merits. Solution 1 is more dbt-esque and provides more functionality to the current schema.yml
, and keeps all metadata/information about models in one place.
I think solution 2 is a powerful alternative with a lot of versatility, but perhaps more care than usual would need to be taken with directory structures and organisation of the YAML files. I think personally I would prefer solution 1.
HI, any news on this? Really excited if this is being considered. I'm happy to contribute it myself as well, though I would need a point in the right direction to start with. Perhaps we could talk about this?
Ok - thanks for the additional context for this one on Slack @DVAlexHiggs.
To recap: You're leveraging the datavault package which relies on vars
to configure models pks, natural keys, load timestamps, etc. I do think the dbt_project.yml
file was a poor place for configs like that historically, so in light of the vars changes in 0.17.0, we should figure out a better path forwards.
I really like the idea of allowing model configs in .yml files. We have a dedicated issue for this over here: https://github.com/fishtown-analytics/dbt/issues/2401. I think #2401 was inspired by this very issue :)
config
, not vars
:version: 2
models:
- name: <model name>
description: <description>
docs:
show: <true|false>
meta:
key1: <value1>
key2: <value2>
config: // Like this
key1: <value1>
key2: <value2>
We'll need to think about the order of parsing. I think a sensible order of operations is:
dbt_project.yml
configsschema.yml
file configs.sql
file configsIf we make the configs from the dbt_project.yml
file and schema.yml
files available in the .sql
file context, then you could simply use config.get()
instead of var()
to capture these config values. This should address your use case with the added benefit of deflating the size of your dbt_project.yml
file which is, i bet, huge for large dbt projects on datavault :)
@drewbanin I think the order you've specified makes sense. This will be a huge help for uber-templated models
Yes the order makes sense, and I agree that config
is a better way forward
Closing in favor of #2401
Describe the feature
It would be excellent if it were possible to provide vars to models in schema.yml files like so:
Additional context
I'm suggesting this feature because the
dbt_project.yml
file gets considerably large and hard to manage at scale, when needing to provide variables to multiple models. It would be a lot easier to manage if there were a way to split the variables into sub-files, and I thought theschema.yml
files would be a perfect candidate, as they already exist and are designed to hold other information related to models.Who will this benefit?
This would mainly benefit larger companies who have created libraries of macros/packages with arguments whose values differ between models, and would prefer not to fill their model files with the setting of variables, but find it too difficult to manage if it's all in the
dbt_project.yml