apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
https://dolphinscheduler.apache.org/
Apache License 2.0
12.4k stars 4.49k forks source link

[DSIP-16][Task] Support stream task #11352

Open caishunfeng opened 1 year ago

caishunfeng commented 1 year ago

Search before asking

Purpose

At present, DS is the scheduling of offline tasks, but as the business requires more and more timely addition of task results, there are more and more scenarios of stream-batch integration, so it is necessary to support real-time tasks.

Use case

Related items

github-actions[bot] commented 1 year ago

Thank you for your feedback, we have received your issue, Please wait patiently for a reply.

davidzollo commented 2 months ago

Now the question list: 1) There is no checkpoint/savepoint management; 2) If the ds service is republished/restarted, the logic is to kill the old task first and then reschedule the new task (Is this logic reasonable for the stream task? If the new task is rescheduled, it should also be started from the latest ckp, otherwise data will be lost) 3) When submitting a flink stream task, you need to ensure that the local pid does not exit. If the pid exits abnormally, the flink task cannot be killed (the current ds kill task is based on memory, and if the pid exits, the memory of the task will be cleared) 4) Lack of logic to pull up the task if the task fails abnormally. (For example, after the yarn cluster hangs and recovers, will the tasks be rescheduled? Is it necessary to add new task status detection logic?)

Extensions: Currently, flink task submission is done through the shell, and whether the follow-up is based on the shell or the API


现在的问题: 1)没有对checkpoint/savepoint管理; 2)ds服务如果重新发布/重启,逻辑是先kill老任务,再重新调度新任务(这个逻辑对于stream任务是否合理?如果重新调度新任务也应该从最新的ckp启动,不然会丢数据) 3)提交flink stream任务需要保证本地的pid不退出才行,如果pid异常退出,会导致flink任务无法kill(目前ds kill任务是基于内存做的,如果pid退出会清掉任务的内存) 4)缺少任务异常失败,拉起任务的逻辑。(比如yarn集群挂掉恢复后,任务是否会重新调度?是否需要新增任务状态检测的逻辑?) 扩展: 目前flink任务提交是通过shell做的,后续是基于shell还是api

davidzollo commented 2 weeks ago

If there is anybody who'd like to implement this feature, please leave a message, thx.