dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.82k stars 1.62k forks source link

[CT-3142] [Feature] Make the current git branch (if any) available in the dbt Jinja context #8690

Open b-per opened 1 year ago

b-per commented 1 year ago

Is this your first time submitting a feature request?

Describe the feature

I would like to be able to access the current git branch name from the dbt context in order to be able to run some Jinja code depending on its value.

If we had current_branch available as a Jinja variable, we could potentially generate models in schemas/database (by adding some logic in generate_schema_name()) that depend on this branch name.

This could be useful for longer living branch where multiple developers will work on the same feature.

We would need to handle the case where dbt is run on code that is not a git repo/branch and maybe return an empty value then.

Describe alternatives you've considered

People could potentially define a var in dbt_project.yml to hard code what branch they are on. This variable would then have the same value as the current branch but the drawbacks are that:

Who will this benefit?

Are you interested in contributing this feature?

Yes!

Anything else?

No response

jtcohen6 commented 1 year ago

Thanks @b-per! There's been previous discussion about this:

The most recent comment (from February) has the exact same request & use case, and I think I buy it. When you have multiple developers working on a single "feature" over the course of a few weeks, you may want them to:

What are the risks?

dbt <> git interaction. dbt doesn't install git as a Python package dependency; it just uses the git available in the OS, and shells out to it inside a subprocess. This can get pretty gross. Right now, all git interactions are limited to dbt deps, and quite unrelated to all other dbt functionality. But if we were to start running git commands as part of resolving Jinja context methods... There is pygit2, which might be a better / lighter-weight way to do something as simple as "tell me the current branch name"?

Partial parsing. I think this could have some wacky interactions with partial parsing. If you change your git branch, and you use the git_branch variable in your custom generate_schema_name macro (which is resolved at parse time) — in order to re-resolve all those schema configs, either:

b-per commented 1 year ago

I am a bit stuck on the partial parsing side of things but the rest (feature and testing) should be OK.

I have also added a git_sha variable for the latest sha based on git log. I think that it could be useful in query_comment etc...

Thinking about it now, would we want to add this info in run_results.json and/or manifest.json as well? Would we want to have the branch information in the Metadata API for example?

aranke commented 1 year ago

I'm commenting here since I was tagged on the PR. First off, thanks for opening this discussion and a corresponding PR, @b-per!

The big question I have from reading this issue is:

Why is saving the Git branch in an env_var (similar to https://stackoverflow.com/a/10915331) not a viable alternative?

A few other considerations from an engineer's perspective (more for @jtcohen6):

  1. Do we want to add another dependency to dbt-core and be responsible for vendoring and distributing it?
    • Especially if we only use this dependency in one place?
  2. What's the impact on performance, especially in giant Git projects?
  3. Philosophically, do we want to align dbt-core and git closer together, or do we want to keep them more independent?

And one more thing: can we experiment with using Dulwich in the PR? From the homepage:

Dulwich is a Python implementation of the Git file formats and protocols, which does not depend on Git itself.

All functionality is available in pure Python. Optional C extensions can be built for improved performance.

This might be a way to mitigate the dbt <> git interaction called out above, but will probably be much slower (and maybe that's ok?).

I'm excited to see where this discussion goes and what solution we come up with!

b-per commented 1 year ago

My personal takes:

KarolinaGojny commented 10 months ago

Hi There, I'm looking for the exact feature described here. The goal is to generate models in schemas created based on current git branch name, without setting the variable manually after switching to another branch. @b-per Did you guys figure out anything in this topic? I would be grateful for any clue or info about status of this enhancement

mahiki commented 1 month ago

Hi I'm here from searching for "dbt run set target to current git branch".

Looks like there isn't a native way to do this interactively. In github actions CICD the pull step is going to select a branch and that will be accessible as a variable for dbt commands, so yeah my use-case is also REPL development workflow.

Here's how I solve it with the just.systems taskrunner:

I have a just environment that computes the current git branch name, which also match the names of my targets.

# ./justfile
set positional-arguments := true
git_branch := `git symbolic-ref --short HEAD`

# select dev branch if not on main or stable (ie feature branches, etc)
branch := if (git_branch) == "main" { "main" } else if (git_branch) == "stable" { "stable" } else { "dev" }
current_git_commit := `git rev-parse HEAD | cut -c 1-8`

# just --list
_default:
  @just --list --unsorted

# dbt run --target <git branch name is inserted here> <the rest of your command>
run *args:
  dbt run  --target {{branch}} "$@"

The just commands insert the target flag and branch name ahead of my dbt run commands.

# currently 'dev' branch
just run
    # dbt run --target dev "$@"