dbt-checkpoint / dbt-checkpoint

:fishing_pole_and_fish: List of `pre-commit` hooks to ensure the quality of your `dbt` projects.
MIT License
602 stars 121 forks source link

dbt-build pre-hook #151

Open kokorin opened 1 year ago

kokorin commented 1 year ago

Describe the feature you'd like In our DBT project we use snapshot feature. On top of snapshots we build models (views) which contain monthly/weekly data. Snapshots can be build by using dbt snapshot and dbt build commands.

Now it's not possible to create snapshots with pre-hooks. So every user has to run dbt build command manually before first commit.

Additional context I think it makes sense to implement pre-hooks for both dbt snapshot and dbt build

noel commented 1 year ago

@kokorin This is not aligned with our plans for checkpoint. We believe that it should focus on the validation hooks vs adding all dbt commands since you can do this via another step in the GH Action. Is there a reason you would not be able to do that?

kokorin commented 1 year ago

Thank you for your reply. Originally we used dbt-checkpoint to validate changed/added models and all upstream models during pre-push hook. But recently our project grew up significantly and now we have 600+ DBT nodes. So we refused from running DBT at pre-push. Instead we validate whole project at CI.

FrankTub commented 3 weeks ago

We also use whole project at CI since we are on Gitlab + Postgres. In CI we run something like:

pre-commit run --all-files

With some help I could open PR to do so if you would be open for it?

Is it sufficient to create a file dbt_checkpoint/dbt_build.py with something like below (pretty sure I need to tweak more, for example when a seed or snapshot is changed):

import argparse
import os
import time
from typing import Any, Dict, List, Optional, Sequence

from dbt_checkpoint.utils import (
    add_config_args,
    add_dbt_cmd_args,
    add_dbt_cmd_model_args,
    add_filenames_args,
    extend_dbt_project_dir_flag,
    get_config_file,
    get_flags,
    paths_to_dbt_models,
    run_dbt_cmd,
)

def prepare_cmd(
    paths: Sequence[str],
    global_flags: Optional[Sequence[str]] = None,
    cmd_flags: Optional[Sequence[str]] = None,
    prefix: str = "",
    postfix: str = "",
    models: Optional[Sequence[str]] = None,
    config: Dict[str, Any] = {},
) -> List[str]:
    global_flags = get_flags(global_flags)
    cmd_flags = get_flags(cmd_flags)
    if models:
        dbt_models = models
    else:
        dbt_models = paths_to_dbt_models(paths, prefix, postfix)
    dbt_project_dir = config.get("dbt-project-dir")
    cmd = ["dbt", *global_flags, "build", "-m", *dbt_models, *cmd_flags]
    return extend_dbt_project_dir_flag(cmd, cmd_flags, dbt_project_dir)

def main(argv: Optional[Sequence[str]] = None) -> int:
    parser = argparse.ArgumentParser()
    add_filenames_args(parser)
    add_dbt_cmd_args(parser)
    add_dbt_cmd_model_args(parser)
    add_config_args(parser)

    args = parser.parse_args(argv)
    config = get_config_file(args.config)
    cmd = prepare_cmd(
        args.filenames,
        args.global_flags,
        args.cmd_flags,
        args.model_prefix,
        args.model_postfix,
        args.models,
        config
    )
    return run_dbt_cmd(cmd)

if __name__ == "__main__":
    exit(main())
FrankTub commented 3 weeks ago

I now see that this is never merged even though there was a PR for it: https://github.com/dbt-checkpoint/dbt-checkpoint/pull/152. Would you be willing to reconsider @noel ?