Closed 13813586515 closed 1 month ago
The current deployment mode is 1 master and 3 slaves.
ds's task monitoring for st is not complete yet. When launching a new st task, it did not go to stserver to check the actual running status of the task.
No response
3.2.x
Duplicated with #16442
Search before asking
What happened
目前部署方式1主3从, 1.通过ds配置了st任务 2.将三台work-server全部停止 3.依次启动3台worker-server 出现以下问题 1.三台worker-server宕机没有killed掉st任务,这主要原因是ds只负责提交任务到st,实际任务运行是由st server来运行,但是当work-server再次启动的时候会发现之前的任务意外停止了 会重新启动新的任务,此时会出现原先的st任务被double了,然后cpu和内存会被撑满,而ds中会出现同样的任务有2条,一条正在运行,一条状态是需要容错
What you expected to happen
ds对于st的任务监控还不完备,拉起新的st任务的时候没有去stserver查看任务的实际运行情况
How to reproduce
1.通过ds配置了st任务 2.将三台work-server全部停止 3.依次启动3台worker-server
Anything else
No response
Version
3.2.x
Are you willing to submit PR?
Code of Conduct