dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.79k stars 1.62k forks source link

[CT-2738] [Feature] Programmatic invocations with supplied manifest should perform partial parsing state check #7945

Open AlexJoz opened 1 year ago

AlexJoz commented 1 year ago

Is this a new bug in dbt-core?

Current Behavior

Programmatic invocation overrides variables only in SQL models, while Python models continue to read values from the configuration. In contrast, a pure CLI run works as expected by overriding all the values in both models.

Expected Behavior

Both python and sql models use overridden (--var) values when invoked by dbtRunner

Steps To Reproduce

sql model:

select {{ var("conf_variable") }}

python model:

def model(dbt, session):
    a = dbt.config.get("conf_variable")
    ...

both models config:

    config:
      conf_variable: '{{ var("conf_variable") }}'

cli invocation: dbt run --vars '{"key": "value", "conf_variable": "1212"}'

programmatic:

from dbt.cli.main import dbtRunner, dbtRunnerResult
from dbt.contracts.graph.manifest import Manifest

dbt_parsed: dbtRunnerResult = dbtRunner().invoke(["parse"])
graph_manifest: Manifest = dbt_parsed.result
runner = dbtRunner(manifest=graph_manifest)

res= runner.invoke(["run", "--vars", '{"key": "value", "conf_variable": "1212"}'])

Relevant log output

No response

Environment

- OS: Ubuntu 22.04
- Python:3.10
- dbt: 1.5.2

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

duckdb

jtcohen6 commented 1 year ago

@AlexJoz Thanks for opening!

I don't think this has to do with SQL models vs. Python models, but rather about:

-- models/my_model.sql
{{ config(
  conf_variable = var("conf_variable")
) }}

-- Variable from config:  {{ config.conf_variable }}
-- Variable itself:       {{ var("conf_variable") }}

select 1 as id
from dbt.cli.main import dbtRunner
from dbt.contracts.graph.manifest import Manifest

abc_manifest: Manifest = dbtRunner().invoke(["parse", "--vars", "conf_variable: abc"]).result
abc_runner = dbtRunner(manifest=abc_manifest)
abc_runner.invoke(["compile", "--vars", "conf_variable: xyz", "--select", "my_model"])
...
08:50:31  Compiled node 'my_model' is:

-- Variable from config:  abc
-- Variable itself:       xyz

select 1 as id
...

There are a few ways we could look to handle this automatically, both of them at least a little tricky:

  1. (Relatively smaller lift) Rather than entirely skipping parsing here, trigger a re-parse if --vars are passed in, and differ from the vars stored on the manifest. Basically, construct the ManifestStateCheck and compare it to manifest.ManifestStateCheck (the one that's been passed in), just like we do for partial parsing.
  2. (Much bigger lift) At parse time, while constructing the manifest, store an executable of the var call, instead of the resolved value itself. Then, always reevaluate the var call given the values that have been passed in.

Programmatic invocations are "full control" mode—you're in charge. So if you're going to change the value of --vars being passed in, you always need to trigger a re-parse.

from dbt.cli.main import dbtRunner
from dbt.contracts.graph.manifest import Manifest

abc_vars = "conf_variable: abc"
abc_manifest: Manifest = dbtRunner().invoke(["parse", "--vars", abc_vars]).result
abc_runner = dbtRunner(manifest=abc_manifest)
# still need to supply --vars (not just reused from Manifest; this could be a future enhancement)
abc_runner.invoke(["compile", "--vars", abc_vars, "--select", "my_model"])

# -- Variable from config:  abc
# -- Variable itself:       abc

xyz_vars = "conf_variable: xyz"
# need to trigger a re-parse because --vars have changed
xyz_manifest: Manifest = dbtRunner().invoke(["parse", "--vars", xyz_vars]).result
xyz_runner = dbtRunner(manifest=xyz_manifest)
xyz_runner.invoke(["compile", "--vars", xyz_vars, "--select", "my_model"])

# -- Variable from config:  xyz
# -- Variable itself:       xyz
AlexJoz commented 1 year ago

@jtcohen6

Programmatic invocations are "full control" mode—you're in charge.

Seems, I haven't dived deep enough yet))

Thanks for the quick feedback! I can confirm that it works with the proposed way of doing this.