Closed ajmarcus closed 2 years ago
Haven't worked with flamegraphs before so took me some time to understand this one. I think a good amount of the runtime is importing the classes from dbt.task
that are used to set_defaults
for the subparsers:
https://github.com/dbt-labs/dbt-core/blob/v1.0.1/core/dbt/main.py#L18-L33 https://github.com/dbt-labs/dbt-core/blob/v1.0.1/core/dbt/main.py#L358
Since the defaults are only used if a subcommand is being called, was thinking we could improve performance by only importing the relevant dbt.task
. For example:
dbt build
we will only import dbt.task.build as build_task
dbt clean
we will only import dbt.task.clean as clean_task
dbt
without a subcommand we will not import any classes from dbt.task
If that sounds like a good approach, I'd be happy to create an example PR. As with any performance optimizations, I'm sure there will be tradeoffs to my approach so no worries if the code is never merged. Curious to learn more about dbt's internals. :)
Thanks for the detailed analysis @ajmarcus! Anecdotally, I definitely notice this myself when I call dbt --version
and I think you're right that we can probably benefit from a quick fix. Thankfully we're planning on making larger changes to the structure of our CLI to make it more reusable which may also solve these performance issues outlined here. Thank you for offering to put up a PR but I think in this case I think we should save the effort since it is likely to be addressed soon anyway. You can follow that work once it starts in earnest on two these tickets:
Because of this, I'm going to close this ticket but explicitly mention it in the other tickets as additional context for why this work is important. Thanks so much for providing all the details here.
Sounds great and makes sense!
Is there an existing issue for this?
Current Behavior
When I run the
dbt
command in my terminal without additional arguments, it takes more than a second to print the help text:Looks like this is a related issue with percieved run performance for specific commands: https://github.com/dbt-labs/dbt-core/issues/4625
Expected Behavior
Basic operations with the
dbt
cli would take less than one second to run.Steps To Reproduce
Install dbt:
Run dbt:
Relevant log output
This could deflinitely be an issue with my machine or setup. To see if I could find more information I used
py-spy
to generate a flamegraph using these commands:There is nothing conclusive, but looks like there is a lot of time spent with calls to
importlib
:https://gist.githubusercontent.com/ajmarcus/b077f4c8a94582df74c39258325addf5/raw/9af82bb55f5b322fb10bc0a959c20b175e689c3c/profile_cli_empty.svg
Which reminded me that Python eagerly loads imported dependencies.
importlib
does provide a lazy import loader but it comes with this warning:https://python.readthedocs.io/en/latest/library/importlib.html#importlib.util.LazyLoader
Environment
What database are you using dbt with?
bigquery
Additional Context