[Feature] Add `target_path` to `global_flags`

dbeatty10 commented 9 months ago

Is this your first time submitting a feature request?

[X] I have read the expectations for open source contributors
[X] I have searched the existing issues, and I could not find an existing issue for this feature
[X] I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

From @graciegoheen here:

Like looking at this page for target-path, I don't see anything for dbt retry - is that expected @ChenyuLInx ?

This is explained by@p.target_path not being included within the flags for dbt retry.

The easiest way to "solve" this would be adding @p.target_path to the dbt retry command here.

But the proposed way to implement this would be to add it to the global_flags and remove it from each of the sub-commands.

Current status

Currently, @p.target_path is included for all sub-commands except for:

Describe alternatives you've considered

A couple alternatives:

don't add target_path to any more sub-commands (which is the status quo)
add target_path only to dbt retry, but not dbt debug or dbt deps

Who will this benefit?

If dbt retry produces any artifacts to the target directory, then this flag would make that location configurable.

Are you interested in contributing this feature?

No response

Anything else?

This would resolve https://github.com/dbt-labs/dbt-core/issues/8948 also.

dbeatty10 commented 9 months ago

[!TIP] All the << EOF business below is the start of a "here document" -- just a handy way to write dbt model files on the fly for reprex purposes.

`dbt retry`

Since dbt retry writes files to the target_path directory, it should be configurable via DBT_TARGET_PATH or --target-path just like nearly all other dbt sub-commands.

Acceptance criteria 1

Users should be able to set the DBT_TARGET_PATH environment variable and have it work with subsequent retries:

export DBT_TARGET_PATH=artifacts
cat << EOF > models/my_model.sql
select 1/0 as id
EOF
dbt run -s my_model
cat << EOF > models/my_model.sql
select 1 as id
EOF
dbt retry

Instead, they will get this error:

01:10:28  Encountered an error:
Runtime Error
  Could not find previous run in 'target' target directory

Acceptance criteria 2

Put another way...

Suppose you start with some bad SQL, run it, and then fix it:

dbt clean
rm -rf target
rm -rf artifacts
cat << EOF > models/my_model.sql
select 1/0 as id
EOF
dbt run -s my_model --target-path target
mkdir -p artifacts
mv target/run_results.json artifacts/
rm -rf target
cat << EOF > models/my_model.sql
select 1 as id
EOF

This command works, and it writes output files to the target directory:

dbt retry --state artifacts

We should make it so this command writes output files into the my_target_path directory:

dbt retry --state artifacts --target-path my_target_path

Something to consider

As shown in the logic here:

dbt retry will use the configured state directory to find run_results.json.
Otherwise, it will fall back to the configured target_path directory.

When these are the same directory, then the newest run_results.json file will overwrite the pre-existing one.

Maybe this is partially why there was the following warning message in 1.7.4 and earlier? Or maybe that warning was there for different reasons?

Warning: The state and target directories are the same: 'target'. This could lead to missing changes due to overwritten state including non-idempotent retries.

This warning was added in https://github.com/dbt-labs/dbt-core/pull/8638. Then the warning disappeared following https://github.com/dbt-labs/dbt-core/pull/9328 (which resolved https://github.com/dbt-labs/dbt-core/issues/9327).

Separate acceptance criteria

In the case of successive executions of dbt retry, we should make sure that it is okay that run_results.json is overwritten.

We should probably move this separate acceptance criteria to its own issue, but leaving it here for sake of expedience.

See below for an example of successive executions of dbt retry. I'm not sure or if it is a sufficient example to show that it's okay for run_results.json to be overwritten each time.

cat << EOF > models/my_model.sql
select 1/0 as id
EOF
dbt run -s my_model
dbt retry
cat << EOF > models/my_model.sql
select 1 as id
EOF
dbt retry

dbeatty10 commented 9 months ago

Decision

@aranke and I discussed this over a video call today and decided to add this to the global_flags and remove it from each of the sub-commands.

Here are the reasons:

there are very few commands that wouldn't want to write to the target-path directory (which is target by default)
if we add any dbt commands in the future, they will automatically get the target-path configuration for free
any commands that don't write to the target-path directory will just ignore that value

dbt-labs / dbt-core