ShifuML / shifu

An end-to-end machine learning and data mining framework on Hadoop
https://github.com/ShifuML/shifu/wiki
Apache License 2.0
251 stars 109 forks source link

Tensorflow Straggler Mitigation by Speculative Execution #644

Open zhangpengshan opened 5 years ago

zhangpengshan commented 5 years ago

Each iteration to do stats and check if any slow workers, check like STDDev and if any outlier worker could be run one as standby backup worker in backup pool.

Mrhs121 commented 4 years ago

There is no need to do that. Backup has been implemented in TF. that means, each iteration only takes the fastest N workers and give up the slowest C Straggler.