dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.69k stars 1.61k forks source link

Invoking dbt as a module #2013

Closed gouline closed 2 years ago

gouline commented 4 years ago

Describe the feature

It would be nice to have a straightforward way of running dbt as a Python module.

To give some context, I use my own build tool https://github.com/gouline/molot that's written in Python and provides basic support for targets, dependencies and arguments. Something half way between Make and Gradle that helps with CI configuration. Things that can be done in Python (e.g. boto3, snowflake-connector-python) are done natively, everything else it just calls shell commands in a subprocess like a Makefile would.

So I'd like a way to invoke multiple dbt commands (e.g. run and docs generate) in one target by just importing dbt.main and calling a function multiple times. The way things are now, main() calls sys.exit(), which exists my outer build as well, and handle_and_check() requires raw command-line arguments passed manually as a list. Ideally, there should be a function of the form dbt.main.execute(command='run', profiles_dir=PROFILES_DIR, ...) that would execute a command and throw any exceptions for the caller to handle.

Describe alternatives you've considered

My current workaround is just invoking dbt command as a subprocess, but it seems like a backward way of doing it, considering that both are Python applications.

Additional context

Closest thing I could find on the issue tracker was https://github.com/fishtown-analytics/dbt/issues/1488, but that sounds more complex than what I'm proposing.

Who will this benefit?

This would benefit anyone who orchestrates their builds with Python scripts.

drewbanin commented 4 years ago

Hey @gouline - kind of funny we didn't already have an issue for this one :)

We've kicked around the idea of providing a public API for dbt in Python a couple of times now. I'm happy for us to add a public method like dbt.main.execute(command='run', profiles_dir=PROFILES_DIR, ...) which implements logic similar to handle_and_check.

I'd still like to provide a rich Python-based interface for running dbt projects in the future. I think that would entail model selection, configuration, execution, etc, etc. In this case though, I think a top-level method like execute (or similar) gets us moving in the right direction.

Thanks for raising this!

Dandandan commented 4 years ago

Hello I like to help out on this issue. Any pointers on how this should be implemented?

Should it just pass on the args to the parser?

picousse commented 3 years ago

This one is pretty old. Is there any progress on this?

Is there something I/we could help with?

mik-laj commented 2 years ago

sys.exit() raises SystemExit exception, so we can use a try-except statement to handle this situation. Here is example; https://github.com/apache/airflow/blob/8505d2f0a4524313e3eff7a4f16b9a9439c7a79f/airflow/cli/commands/config_command.py#L40-L44 https://github.com/apache/airflow/blob/8505d2f0a4524313e3eff7a4f16b9a9439c7a79f/tests/cli/commands/test_config_command.py#L60-L80

To catch stdout/stderr, we can use contextlib.redirect_stdout/contextlib.redirect_stderr decorator.

with contextlib.redirect_stdout(io.StringIO()) as temp_stdout:
    __import__('dbt.main').main(argv)
ismailsimsek commented 2 years ago

+1 Is there any progress on this?

WillemSFV commented 2 years ago

+1 Is there any progress on this?

I suspect there won't be much progress on this now that DBT Cloud's a thing.

github-actions[bot] commented 2 years ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions[bot] commented 2 years ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

ismailsimsek commented 2 years ago

Hello I like to help out on this issue. Any pointers on how this should be implemented?

Should it just pass on the args to the parser?

+1

jtcohen6 commented 2 years ago

This is a fairly old issue. We are finally making progress in this direction, providing a programmatic API into dbt Core: https://github.com/dbt-labs/dbt-core/issues/5527

The ability to invoke dbt as a module (instead of CLI script) isn't explicitly in scope for that initiative, but the big idea— providing a more sensible "main" method as entry-point to Core execution—certainly is.

manugarri commented 1 year ago

any updates on this? Seems like it would be easy to implement given how all of dbt-core is python already.

jtcohen6 commented 1 year ago

@manugarri Check out:

joellabes commented 9 months ago

This issue has better google results than the actual feature's docs, so I'm going to leave a pointer here: https://docs.getdbt.com/reference/programmatic-invocations