dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.01k stars 1.63k forks source link

[CT-746] [Bug] Variables not getting into packages #5368

Open solomonshorser opened 2 years ago

solomonshorser commented 2 years ago

Is there an existing issue for this?

Current Behavior

I have a project that relies on a base package. Some of the values in that base package's dbt_project file should come from the parent project. Values for variables are passed to the parent package from the CLI, via the --vars... option. The base project does not seem to get the value.

In the parent dbt_project file:

vars:
  my_var: "{{ var('MY_VAR') }}"

In the base project:

vars:
  my_var: "{{ var('MY_VAR') }}"

When executing commands the use my_var in the base project, I get errors of the form:

Invalid project ID '{{ var('MY_VAR') }}'. Project IDs must contain 6-63 lowercase letters, digits, or dashes. Some project IDs also include domain name separated by a colon. IDs must start with a letter and may not end with a dash.

MY_VAR in this case is used to dynamically determine a BigQuery project ID, but in this case, you can see that the literal '{{ var('MY_VAR') }}' was used instead of the value.

I also tried specifying the project in the parent's vars block, but it did not seem to work:

vars:
  my_var: "{{ var('MY_VAR' }}"
  base_package:
    my_var: "{{ var('MY_VAR' }}"

This happens when I run dbt test. dbt compile does not have any problems.

Expected Behavior

My expectation is that the value from the CLI ... --vars '{ "MY_VAR": "some-value", ... } ' would be used in both the parent project and the base project.

This is based on the wording here: https://docs.getdbt.com/docs/building-a-dbt-project/building-models/using-variables

These vars can be scoped globally, or to a specific package imported in your project.

And also

Variables defined with the --vars command line argument override variables defined in the dbt_project.yml file. They are globally scoped and will be accessible to all packages included in the project.

(emphasis mine)

Did I misunderstand the documentation?

IS there a way to pass a variables value from the CLI to the packages that a project depends on?

Steps To Reproduce

  1. Create a project that depends on another project
  2. Both projects have a variable in dbt_project:
    vars:
    my_var: " {{ var('my_var') }} "
  3. Run parent project with CLI vars: `... --vars '{ "MY_VAR": "some_Value" }'
  4. Parent project is able to reference MY_VAR but base project cannot.

Relevant log output

No response

Environment

- OS: Mac OS 12.4
- Python: 3.9.13
- dbt: 1.1.0

What database are you using dbt with?

bigquery

Additional Context

No response

solomonshorser commented 2 years ago

Is this issue related? https://github.com/dbt-labs/dbt-core/issues/2769

solomonshorser commented 2 years ago

I tried another approach, having package-scoped variables, but passing the value with a new variable:

Parent project:

vars:
  my_var: "{{var('MY_VAR')}}"
  base_package:
    parent_my_var: "{{var('MY_VAR')}}"

In base_package's dbt_project.yml:

vars:
  my_var: "{{var('parent_my_var')}}"

But this just results a similar error:

Invalid project ID '{{ var('parent_my_var') }}'. ...

gshank commented 2 years ago

I'm not sure what the reason for this is, but the code explicitly says that cli vars won't be passed to the project creation for dependencies. In the 'new_project' method of RuntimeConfig, in core/dbt/config/runtime.py:

        # load the new project and its packages. Don't pass cli variables.
        renderer = DbtProjectYamlRenderer(profile)  
gshank commented 2 years ago

It would be possible to pass in the cli_vars to dependency project creation, since 'load_dependencies' is called from RuntimeConfig (which should already have the cli_vars).

jtcohen6 commented 2 years ago

Ah, I think what's actually happening here is that vars aren't rendered today! So you can put whatever Jinja you want in there:

# dbt_project.yml
vars:
  my_var: "{{ 'val_one' if target.name == 'prod' else 'val_two' }}"

But it won't actually be rendered when dbt_project.yml is loaded—it will just be stored as the raw string. In some rendering contexts (i.e. model compilation), that raw string will be rendered later on.

When executing commands the use my_var in the base project

One other thing: If you install other projects as packages, it is still always expected that you're executing dbt from the top-level / root project. (That's definitionally true: The root project is always the one from/in which you are invoking dbt.) The vars defined in that dbt_project.yml can be used to reconfigure resources from those packages, but they will not be "passed down" if you're actually invoking dbt from within those packages.

solomonshorser commented 2 years ago

So... If I want to pass a variable to parent_project from the command line (while executing dbt in the root directory of parent_project), and the value of that variable also needs to be accessed in base_package (which parent_project depends on), what's the best way to do this?

It sounds like I might need to have some pre-dbt shell script that modifies parent_project/dbt_packages/base_package/dbt_project.yml and injects the values that way, but I'm hoping there's a better way that I'm just missing.

solomonshorser commented 2 years ago

@jtcohen6

The vars defined in that dbt_project.yml can be used to reconfigure resources from those packages, but they will not be "passed down" if you're actually invoking dbt from within those packages.

I was invoking dbt from the directory of parent_project, but I was using a selector that referenced source resources that exist in base_package, to test the sources before building the models defined in base_package. I guess one solution would be to invoke dbt from base_package if I'm specifically interested in resources that only exist in that package. It would make execution a little weird, cd'ing back and forth from parent_project to base_package... It might work...

solomonshorser commented 2 years ago

The solution I ended up using:

Remove the variables from base_packages dbt_project file. I lose the ability to specify default values if a variable is not passed in from the CLI, and since the variables are not clearly defined in the dbt_project, they need to be clearly explained in a README or something. It's not ideal, but it works.

prgx-aeveri01 commented 2 years ago

I'm having a similar issue here where i'm defining a conditional var value for a database name in my dbt_project.yml that i'm using in my sources yaml but when I attempt to use a source using the {{ source() }} function in a model the raw string is what get's used rather than the rendered value

fivetran-jamie commented 2 years ago

just ran into an issue that sounds related, but it could maybe be specific to using variables to dynamically enable sources from the src.yml file

basically, in our Hubspot Source package, we have some variables to disable models associated with the Hubspot Service endpoint (and many other tables, but i'll just refer to service ones for this example).

In the package's dbt_project.yml file, we set the hubspot_service_enabled var to False by default, so service tables are not run. When a user installs the package in their project, they are able to overwrite hubspot_service_enabled in their root dbt_project.yml file to set it to True and run service-related models.

However, the source enabled config is not able to be overwritten this way and is not capturing the new variable value. So, if you set the hubspot_service_enabled var to True, you'll have errors about models trying to select from a source node that is disabled.

similar to @solomonshorser, we had to remove the variable from the package's dbt_project.yml, which is a little bit of a bummer as it's nice to have all of the default values for the variables easily accessible in one place (rather than in-line) but perhaps that's better suited for the README. I remember @jtcohen6 saying you shouldn't add variable values to a package dbt_project.yml, but this seems kinda funky

github-actions[bot] commented 1 year ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

solomonshorser commented 1 year ago

Is any further work happening on this?

github-actions[bot] commented 9 months ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.