dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.68k stars 1.61k forks source link

[Feature] better performance logging for `docs generate` command #10158

Open dataders opened 4 months ago

dataders commented 4 months ago

Is this your first time submitting a feature request?

Describe the feature

the user wants the following debug log, to be reported as info but with more added context about which query was run.

SQL status: SUCCESS 24005 in 239.0 seconds

Originally posted by @extrospective in https://github.com/dbt-labs/dbt-snowflake/issues/1034#issuecomment-2115550220

[...] If I run "dbt docs generate" there is little useful information. [...] if docs generation were to continue to be a multi-minute query some info could be added there.

On the other hand, when running with debug:

When you run "dbt docs generate --debug" - there is a line which reads for me:

SQL status: SUCCESS 24005 in 239.0 seconds

That line is perhaps the most important in the file - since it accounts for much of the time.

One option - make the line more clear what it refers to. 3 seconds later in the log file is what presumably is the connection which caused it:

13:32:56  SQL status: SUCCESS 24005 in 239.0 seconds
13:32:59  On dbt_db.information_schema: Close

if there is parallelization, no guarantee something will not be in between - so I might like a more descriptive first line.

And if the problem is not made to "go away", I could imagine accumulating some critical timing information for these important queries, and at the end of the log file process (with --debug) flushing that critical information there - so it stands out from the 43K line file.

Describe alternatives you've considered

docs generate is faster

Who will this benefit?

all users of dbt docs

Are you interested in contributing this feature?

No response

Anything else?

No response

jtcohen6 commented 4 months ago

Perhaps some INFO-level logging to this effect:

$ dbt docs generate
12:18:42  Running with dbt=1.8.0
12:18:42  Registered adapter: snowflake=1.8.1
12:18:43  Found 1234 models, ...
12:18:43
12:18:43  Compiling project # unless --no-compile
12:19:43  Building catalog
12:19:55  Finished generating catalog for 1234 objects in database1.schema1
12:19:57  Finished generating catalog for 98765 objects in database1.schema2
12:20:01  Catalog written to /Users/jerco/dev/scratch/testy/target/catalog.json

Those specific events might make most sense to fire in adapter.get_filtered_catalog, in which case we should move to dbt-adapters.