[Improvement] [Seatunnel JOB] Can not adapt checkpoint in Seatunnel？

apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

Apache License 2.0

12.83k stars 4.61k forks source link

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

Seatunnel本身有checkpoint的机制，海豚调度也存在恢复容错的机制，这两者目前的结合不完善存在一定的bug，验证如下 1.通过海豚调度部署Seatunnel的cdc任务， 2.模拟意外宕机：杀死海豚调度的任务进程（此时Seatunnelclient任务并没有被杀死） 3.启动海豚调度 4.此时海豚调度会启动容错恢复机制，会重新提交新的Seatunnelclient任务 5.当Seatunnelclient任务较多时，会依次被恢复，导致同样的Seatunnel task被创建，如果任务很多的话，会直接导致cpu短时间内暴涨最终导致雪崩

What you expected to happen

1.海豚的恢复容错目前看来是并发的，考虑到任务的数量，是否应该在恢复容错时控制并发甚至按照串行方式恢复 2.调度意外宕机，再次启动时，发现st任务没有kill应该无需恢复

How to reproduce

1.通过海豚调度部署Seatunnel的cdc任务， 2.模拟意外宕机：杀死海豚调度的任务进程（此时Seatunnelclient任务并没有被杀死） 3.启动海豚调度 4.此时海豚调度会启动容错恢复机制，会重新提交新的Seatunnelclient任务 5.当Seatunnelclient任务较多时，会依次被恢复，导致同样的Seatunnel task被创建，如果任务很多的话，会直接导致cpu短时间内暴涨最终导致雪崩

Anything else

No response

Version

dev

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct