dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.88k stars 1.63k forks source link

[Feature] Improve startup time of typing `dbt` #9814

Closed b-per closed 3 weeks ago

b-per commented 7 months ago

Is this your first time submitting a feature request?

Describe the feature

On my Mac M1, running dbt without any subcommand or flag takes between 1.2 and 1.5 secs to run and show me the subcommands list (measured with time).

This is not a big problem when using dbt on a day to day basis but it prevents us from being to leverage the out of the box shell completion from Click like implemented here . Each call to complete the command or params takes between 1.2 and 1.5 secs, making the completion not really usable.

I'd expect that running dbt without any parameter or subcommand would be instantaneous and not take more than 1 sec.

Describe alternatives you've considered

Not improving the startup speed and not being able to leverage the free completion script from Click.

Who will this benefit?

Are you interested in contributing this feature?

Yes, but am I the best person

Anything else?

No response

b-per commented 7 months ago

I am wondering if some of the imports from that file are causing the delay and if we could move them inside specific functions/classes

import functools
from copy import copy
from dataclasses import dataclass
from typing import Callable, List, Optional, Union

import click
from click.exceptions import (
    Exit as ClickExit,
    BadOptionUsage,
    NoSuchOption,
    UsageError,
)

from dbt.cli import requires, params as p
from dbt.cli.exceptions import (
    DbtInternalException,
    DbtUsageException,
)
from dbt.contracts.graph.manifest import Manifest
from dbt.artifacts.schemas.catalog import CatalogArtifact
from dbt.artifacts.schemas.run import RunExecutionResult
from dbt_common.events.base_types import EventMsg
b-per commented 7 months ago

Here is a screenshot of "tuna" gotten with python -X importtime core/dbt/cli/main.py 2> tuna.log followed by tuna tuna.log. It shows the performance of the different imports.

image

Most of those imports shouldn't be required when running a simple dbt command but I don't know the effort behind not loading those in that case.

b-per commented 7 months ago

I just tried to strip quite a bit of code to find if there is a piece to focus on, but even after removing a lot of imports (making dbt work only to show its commands and args), it still takes more than 1 sec.

Instead of moving imports it might be better to have a "click only" flow where we don't call any code from core.dbt and just provide the commands/subcommands/arguments straight away.

image

peterallenwebb commented 7 months ago

@b-per Your timing could not be more perfect. I think I can get you a lot of the remaining time back. Check out https://github.com/dbt-labs/dbt-common/pull/98. This should go out with the next release of dbt-common, which I have been told is scheduled for tomorrow, March 26th.

b-per commented 7 months ago

Great news!!

And if 0.7s is already much better than 1.5s, I still can't stop thinking that it feels a bit long for something that is just about showing a list of commands and parameters.

peterallenwebb commented 7 months ago

Agreed. If your changes combined with mine don't get us down to "almost instant" then I can work with you to get us the rest of the way. I'm confident we can do it.

b-per commented 7 months ago

My changes are not OK to be merged because they are breaking the normal dbt flow but your change plus this heavyweight removal of imports make dbt run in ~0.45 secs. So, a slight decrease from 0.7 but nothing major either.

dbeatty10 commented 7 months ago

Makes sense to be able to add sub-command and parameter completion like proposed in https://github.com/dbt-labs/dbt-completion.bash/pull/21

@b-per Two questions for you:

  1. It sounds like the poor responsiveness you observed is a known issue with Click. Did you already try this?
  2. If the above doesn't help, would this be useful, by any chance?
b-per commented 7 months ago

For 1., the docs say

To speed it up, write the generated script to a file, then source that

This is what is done in the implementation in dbt-completion.bash ; the delay is not from Click in that case but from dbt itself.

I think 2. could help as well. But in that case, we would need a specific subcommand in dbt to get the matrix of all commands and all parameters allowed. Then, we'll need to process it in bash/zsh and make it work with their completion framework (possible but not trivial and not fun :-) )

I feel like if we reach a start up time of dbt of ~100-200 ms, then we wouldn't need #6840 and could just use the out of the box completion without any delay.

For the moment, I am thinking of adding the "auto" completion in dbt-completion.bash, but only active when a given env var is set, in case people want to use it despite the perf hit.

github-actions[bot] commented 1 month ago

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions[bot] commented 3 weeks ago

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.