dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.02k stars 1.64k forks source link

[Feature] DBT clone should output more useful information to the log #9501

Open ttusing opened 10 months ago

ttusing commented 10 months ago

Is this your first time submitting a feature request?

Describe the feature

Currently, running dbt clone has very minimal output to the log. Here an example log:

dbt clone --full-refresh

17:39:45  Running with dbt=1.7.6
17:39:47  Registered adapter: snowflake=1.7.1
17:39:52  Found x models...
17:39:53  
17:39:55
17:39:55  Concurrency: 32 threads (target=...)
17:39:55  
17:40:48  
17:40:48
17:40:48  Completed successfully
17:40:48  
17:40:48
17:40:48  Done. PASS=430 WARN=0 ERROR=0 SKIP=0 TOTAL=430

I get very little information about what objects were cloned, their names/schemas/database, or anything else.

Information that would be useful in the log:

Describe alternatives you've considered

Using the debug log, which contains some of this information is a less convenient format.

Who will this benefit?

Any developers using dbt clone, especially as part of a deploy process.

Are you interested in contributing this feature?

Yes, especially if pointed to modules

Anything else?

No response

dbeatty10 commented 10 months ago

Thanks for reaching out @ttusing !

We agree that it would be nice to have more information logged to the console.

For example it would be nice to have output similar to dbt build like this:

20:30:59  1 of 2 START sql view model dbt_dbeatty.my_model_1 ............................. [RUN]
20:30:59  2 of 2 START sql table model dbt_dbeatty.my_model_2 ............................ [RUN]
20:30:59  1 of 2 OK created sql view model dbt_dbeatty.my_model_1 ........................ [SUCCESS 1 in 0.78s]
20:31:00  2 of 2 OK created sql table model dbt_dbeatty.my_model_2 ....................... [SUCCESS 1 in 1.55s]

Acceptance criteria

  1. Show progress in the console log that is similar to dbt run
  2. Raise a warning when a model is not cloned because it already exists and there is no full_refresh flag to override it.
  3. Specify in the warning message that it can be overridden with the full_refresh flag
  4. Each cloned model should appear within target/run_results.json

It looks to me like 2 and 4 are already covered, but we should double-check.

For example, here is the output I get when the objects already exist:

$ dbt clone --state artifacts --target green     
20:51:01  Running with dbt=1.7.5
20:51:03  Registered adapter: snowflake=1.7.1
20:51:03  Found 2 models, 1 analysis, 0 sources, 0 exposures, 0 metrics, 436 macros, 0 groups, 0 semantic models
20:51:03  
20:51:04  Concurrency: 10 threads (target='green')
20:51:04  
20:51:04  Relation "ANALYTICS_DEV"."DBT_DBEATTY_GREEN"."MY_MODEL" already exists
20:51:04  Relation "ANALYTICS_DEV"."DBT_DBEATTY_GREEN"."MYMODEL" already exists
20:51:04  
20:51:04  Completed successfully
20:51:04  
20:51:04  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2

But those go away when I use the --full-refresh flag.

Example commands

dbt build --target blue --target-path artifacts  
dbt clone --target green --state artifacts --full-refresh