apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.39k stars 4.49k forks source link

[DSIP-49][Workflow] Rerun across workflow or project with dependency automatically #16194

Open zhuxt2015 opened 1 week ago

zhuxt2015 commented 1 week ago

Search before asking

Motivation

There are three workflow in the one project ,such as ods,dwd,dws. in workflow ods, it include two task A,B. in workflow dwd,it include two task C,D. in workflow dws ,it include one task E. one of the dependency relationship describe as follow: A->C->E If the calculation logic of task A is wrong or the task failed, the downstream tasks C and E need to be rerun, and A, C and E can only be rerun manually, instead of automatically starting C and E when A finished.

Design Detail

Through t_ds_process_task_relation, t_ds_process_definition, t_ds_task_definition three tables can get the task lineage, then all downstream tasks of task A can be found, Then we can rerun the task layer by layer according to the dependency hierarchy. for example, the downstream of A has task C, and the downstream of C is E, then when A completes the rerun, start to rerun C, wait for C to run completely, and start to rerun E.

Compatibility, Deprecation, and Migration Plan

  1. If there is a running instance of the rerun task, you need to close it first to avoid unnecessary errors caused by running the same task instance at the same time.
  2. You can only rerun the workflow of your own execution permission

Test Plan

No response

Code of Conduct