dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
9.99k stars 1.63k forks source link

[Feature] Configuration for runtime "priority" among models with satisfied dependencies #10632

Open owenprough-sift opened 2 months ago

owenprough-sift commented 2 months ago

Is this your first time submitting a feature request?

Describe the feature

Problem Statement

dbt runs models in DAG order, which is functionally correct. But there are situations^1 where it would be helpful to have more control over the relative execution order of models within a run. For example: in a run which includes a long-running model with no upstream dependencies but many downstream dependencies it would be helpful to start the long-running model first to minimize total run time.

Proposed Solution

A new execution_order configuration which allows you to specify the relative execution order of selected resources. At runtime, dbt would:

  1. Determine the set of resources whose dependencies are satisfied (aka "run in DAG order")
  2. Within that set, run the resources ordered by execution_order (nulls last), falling back to whatever is the current ordering logic

Describe alternatives you've considered

Workarounds with which I am familiar:

Who will this benefit?

Folks with long-running models in the middle of their DAGs

Are you interested in contributing this feature?

No

Anything else?

I realize that giving developers some control over execution order is likely controversial and potentially complicated to implement, but I see this as a useful Advanced Feature™ (a la incremental predicates) for those situations where complex DAG runtime is sub-optimal.

owenprough-sift commented 2 months ago

Another data point: https://getdbt.slack.com/archives/CBSQTAPLG/p1725649648110359?thread_ts=1725643109.036959&cid=CBSQTAPLG