dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.01k stars 1.63k forks source link

[Bug] Can't pass `args` for `run-operation` in programmatic invocation as kwarg #10355

Open ben-schreiber opened 5 months ago

ben-schreiber commented 5 months ago

Is this a new bug in dbt-core?

Current Behavior

Passing the --args parameter as a kwarg to the dbtRunner.invoke method throws a TypeError. This forces the parameter to only be pass as:

dbtRunner().invoke(['run-operation', '--args', '...'])

This looks to be caused by a naming conflict with the invoke method here

Expected Behavior

To also be able to pass it as a kwarg

dbtRunner().invoke(['run-operation'], args=...)

Steps To Reproduce

from dbt.cli.main import dbtRunner

runner = dbtRunner()
runner.invoke(['run-operation', 'some_macro'], args={'arg1': 'foo', 'arg2': 'goo'})

Relevant log output

TypeError
...
TypeError: dbtRunner.invoke() got multiple values for argument 'args'

Environment

- OS: MacOS Sonoma 14.4
- Python: 3.10.12
- dbt: 1.8.3

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

ben-schreiber commented 5 months ago

Here are two suggestions which came to mind:

  1. The args parameter could be renamed. Something like,

    def invoke(self, invocation_args: List[str], **kwargs) -> dbtRunnerResult: ...

    Although, this would be a breaking change and would need to be handled accordingly.

  2. To add a dedicated name for this use-case (e.g. macro_args or something like that) which then would be translated back to args

    def invoke(self, args: List[str], **kwargs) -> dbtRunnerResult:
    if 'macro_args' in kwargs:
        kwargs['args'] = kwargs.pop('macro_args')

    This too would have to be appropriately documented since it breaks from the kwargs convention of this method.

dbeatty10 commented 4 months ago

Thanks for reporting this along with the root cause and some ideas @ben-schreiber !

Root cause

Agreed that the cause is the naming conflict between the positional parameter args in the invoke method and the --args CLI flag for the run-operation subcommand.

Workaround

Pass in strings for the --args CLI flag like this:

runner.invoke(
    ["run-operation", "some_macro", "--args", "{'arg1': 'foo', 'arg2': 'goo'}", "--vars", "{'some_var': 'doog'}"]
)
### Reprex and examples `macros/some_macro.sql` ```sql {% macro some_macro(arg1, arg2) %} {{ log("arg1: " ~ arg1, True) }} {{ log("arg2: " ~ arg2, True) }} {{ log("some_var: " ~ var("some_var", "loo"), True) }} {% endmacro %} ``` `some_script.py` ```python from dbt.cli.main import dbtRunner runner = dbtRunner() # Works runner.invoke( ["run-operation", "some_macro", "--args", "{'arg1': 'foo', 'arg2': 'goo'}", "--vars", "{'some_var': 'doog'}"] ) # Also works runner.invoke( args=["run-operation", "some_macro", "--args", "{'arg1': 'foo', 'arg2': 'goo'}", "--vars", "{'some_var': 'doo'}"] ) # # Doesn't work due to naming conflict with the `args` parameter # runner.invoke(["run-operation", "some_macro"], args={"arg1": "foo", "arg2": "goo"}, vars={"some_var": "doo"}) ``` ```shell dbt run-operation some_macro --args "{'arg1': 'foo', 'arg2': 'goo'}" --vars "{'some_var': 'doo'}" python some_script.py ```
### Monkey patch example While creating a monkey patch isn't what we'd recommend or advocate for in this scenario, sharing this code example for sake of completeness. ```python from typing import List from dbt.cli.main import dbtRunnerResult from dbt.cli import requires from click.exceptions import BadOptionUsage from click.exceptions import NoSuchOption, UsageError from click.exceptions import Exit as ClickExit from dbt.cli.main import cli from dbt.cli.main import dbtRunner runner = dbtRunner() # Define a new version of the invoke method def new_invoke(self, invocation_args: List[str], **kwargs) -> dbtRunnerResult: try: dbt_ctx = cli.make_context(cli.name, invocation_args.copy()) dbt_ctx.obj = { "manifest": self.manifest, "callbacks": self.callbacks, } for key, value in kwargs.items(): dbt_ctx.params[key] = value # Hack to set parameter source to custom string dbt_ctx.set_parameter_source(key, "kwargs") # type: ignore result, success = cli.invoke(dbt_ctx) return dbtRunnerResult( result=result, success=success, ) except requires.ResultExit as e: return dbtRunnerResult( result=e.result, success=False, ) except requires.ExceptionExit as e: return dbtRunnerResult( exception=e.exception, success=False, ) except (BadOptionUsage, NoSuchOption, UsageError) as e: return dbtRunnerResult( exception=DbtUsageException(e.message), success=False, ) except ClickExit as e: if e.exit_code == 0: return dbtRunnerResult(success=True) return dbtRunnerResult( exception=DbtInternalException(f"unhandled exit code {e.exit_code}"), success=False, ) except BaseException as e: return dbtRunnerResult( exception=e, success=False, ) # Monkey patch the invoke method dbtRunner.invoke = new_invoke # The rest of your code goes below ```
dbeatty10 commented 4 months ago

Options

  1. Rename the args parameter of invoke to something unique (e.g., invocation_args) accepting that it will break for anyone using the positional argument as a named parameter.
  2. Add a dedicated name for this use-case (e.g. macro_args) accepting the complications in the code and user experience
  3. Leave things as-is accepting that using the --vars flag for run-operation requires the workaround above or monkey patching the definition of invoke.

The first option of giving it a unique name like invocation_args seems most attractive to me. I'm guessing that it would be uncommon for folks to use a named parameter within calls to invoke(), and none of our code examples use the named parameter.

If we went this route, we would call it out in a migration guide. We'd also want to take care to choose a name that is unique enough that we'd be unlikely to ever add a CLI flag with the same name. See here for an example script that can get the names of all the CLI flags.

@ben-schreiber To help us understand the trade-offs here, could you share more about the effect on the user experience if we leave it as-is and users only have the workaround you mentioned?

ben-schreiber commented 4 months ago

@dbeatty10 I agree with you that option one is ideal. I would extend it even further to update the method's signature to include /:

def invoke(self, invocation_args: List[str], /, **kwargs) -> dbtRunnerResult: ...

thereby formalizing the desire to separate the usage of invocation_args (positional only) from kwargs (keyword only) in code. This would also allow the invocation_args argument to be be given a unique name without worrying as much about readability/usability for end users.

In regards to your question, the effect is not terribly large as long as this edge case is clearly documented. I encountered the issue when building a wrapper around DBT's programmatic invocations for Apache Airflow. In that use case, passing keyword arguments to invoke is more natural than converting Pythonic objects to the CLI representation.

ben-schreiber commented 4 months ago

Also, here is another/shorter monkeypatch:


from typing import List
from dbt.cli.main import dbtRunnerResult, dbtRunner

invoke_v1 = dbtRunner.invoke

def invoke_v2(self, invocation_args: List[str], /, **kwargs) -> dbtRunnerResult:
    args_copy = list(invocation_args)
    kwargs_copy = dict(kwargs)
    if 'args' in kwargs_copy:
         args_copy += ['--args', kwargs_copy.pop('args')]
    return invoke_v1(self, args_copy, **kwargs_copy)

dbtRunner.invoke = invoke_v2