apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.73k stars 4.58k forks source link

[DSIP-73] Add dolphinscheduler-task-executor module to unify the task execution logic #16619

Open ruanwenjun opened 2 weeks ago

ruanwenjun commented 2 weeks ago

Search before asking

Motivation

Right now, we have two similar task execution process.

Most of these two execution process are similar, except some thread model. The problem is we have two similar code, once we want change the task execution runtime, we should change both of these. This DSIP is aim to unify these.

Design Detail

I hope to add a new module dolphinscheduler-task-executor module which will provide a TaskEngine used to responsible for task execution.

TaskEngine

The high-level architecture may look like below

image

The WorkflowEngine will use TaskExecutorClient to communicate with TaskEngine. TaskEngine provides interface to response for the control of TaskExecutor, and TaskEngine can also send runtime event to master.

There are some components in the TaskEngine

TaskExecutor

The TaskExecutor represent a runtime task in TaskEngine.

image

Each TaskExecutor contains a EventBus to store the event belongs to the task executor, all operation of the task executor should be transform to TaskExecutorLifecycleEvent and be fired by async. The event will be fired by ordered, we use this to avoid concurrency problems with operations.

TaskExecutorRepository

Used to store TaskExecutor in runtime, once the TaskExecutor execute finished, then will removed from TaskExecutorRepository.

TaskExecutorContainerDelegator

The delegator for TaskExecutorContainer, there are two kinds of TaskExecutorContainer.

The TaskExecutorContainer will contains some TaskExecutorWorker, each TaskExecutorWorker will be fired by a single thread, so the ratio between worker and thread is 1:1.

But not all task will block a thread in its whole lifecycle. So there are two kinds of TaskExecutorContainer.

SharedThreadTaskExecutorContainer

One TaskExecutorWorker can be assigned multiple tasks.

image

ExclusiveThreadTaskExecutorContainer

One TaskExecutorWorker can only assigned one task.

image

TaskExecutorEventBusCoordinator

The TaskExecutorEventBusCoordinator used to assign/unassign the TaskExecutor to TaskExecutorEventBusFireWorker

Lifecycle of TaskExecutor

image image

Compatibility, Deprecation, and Migration Plan

Compatibility with previous version.

Test Plan

Test by IT, E2E. Will add new IT case.

Code of Conduct