dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.46k stars 1.58k forks source link

[Regression] all --vars of run_results artifact are casted to str but they were not before #10421

Open fabientra opened 1 month ago

fabientra commented 1 month ago

Is this a regression in a recent version of dbt-core?

Current Behavior

Our dbt retry using a variable from CLI of type INT are failing because during the retry dbt sees it as STR.

I compared the run_results.json from dbt 1.8 to run_results.json from dbt 1.7 and I can see that the one from 1.8 have all the vars as STR whereas from 1.7 they kept their original type.

Expected/Previous Behavior

vars in run_results.json should keep the same type as the one give through the CLI

Steps To Reproduce

  1. with dbt 1.8, run a dbt run command with a variable of type int
  2. look at the run_results.json artifact

Relevant log output

No response

Environment

- OS:MacOS Sonoma 14.5
- Python: Python 3.10.11
- dbt (working version): 1.7
- dbt (regression version): 1.8

Which database adapter are you using with dbt?

redshift

Additional Context

I believe the regression is caused by this PR that applies the scrubbing function to ALL cli variables whereas this function cast its input to str systematically

def scrub_secrets(msg: str, secrets: List[str]) -> str:
    scrubbed = str(msg)

    for secret in secrets:
        scrubbed = scrubbed.replace(secret, "*****")

    return scrubbed
dbeatty10 commented 1 month ago

Thanks for reporting this @fabientra and identifying the originating PR 👑

I was able to reproduce this (see details in "Reprex" below).

Possible fix

A possible fix might be leaving msg as-if here unless there were any secrets scrubbed:

def scrub_secrets(msg: str, secrets: List[str]) -> str:
    scrubbed = str(msg)

    for secret in secrets:
        scrubbed = scrubbed.replace(secret, "*****")

    return msg if str(msg) == scrubbed else scrubbed

i.e. just replace the return statement here with this:

    return msg if str(msg) == scrubbed else scrubbed

Of course you'll probably still get an error during dbt retry if you included a secret within your --vars, but that's acceptable because: 1) we'd hope folks don't include secrets in --vars, and 2) we explicitly don't want to write out secrets in plain text (which would be required for retry).

### Reprex Start with a `models/my_model.sql` like this: ```sql {% if execute %} {{ exceptions.raise_compiler_error("Force a failure") }} {% endif %} {% if not var("id") is number %} {{ exceptions.raise_compiler_error("var('id') is not a number") }} {% endif %} select 1 as id ``` And try to run it like this (which will yield an error): ```shell dbt run -s my_model --vars '{"id": 2}' ``` Then remove these first 3 lines from `models/my_model.sql` so that it no longer will error this specific way: ```sql {% if execute %} {{ exceptions.raise_compiler_error("Force a failure") }} {% endif %} ``` And retry it: ```shell dbt retry ```
dbeatty10 commented 1 month ago

For completeness, we'd probably want to update the type hints as well:

def scrub_secrets(msg: Any, secrets: List[str]) -> Any:
fabientra commented 1 month ago

Thank you for the fix @dbeatty10 :)