Open duyanghao opened 6 years ago
Signed-off-by: duyanghao 1294057873@qq.com
Add recovery logic for failed pod and fix MEM_EXCEEDED_EXIT_CODE constant.
Manual tests show successful for recovery of failed pod as below:
spark.executor.instances=5
# kubectl get pods -n=xxx -a -o wide|grep spark-debug-sar-test8 spark-debug-sar-test8 1/1 Completed 0 3m 192.168.25.92 x.x.x.x spark-debug-sar-test8-exec-1 1/1 Completed 0 3m 192.168.25.94 x.x.x.x spark-debug-sar-test8-exec-2 1/1 Completed 0 3m 192.168.25.93 x.x.x.x spark-debug-sar-test8-exec-3 0/1 Error 0 3m 192.168.11.31 x.x.x.x spark-debug-sar-test8-exec-4 0/1 Error 0 3m 192.168.11.37 x.x.x.x spark-debug-sar-test8-exec-5 0/1 Error 0 3m 192.168.11.44 x.x.x.x spark-debug-sar-test8-exec-6 1/1 Completed 0 48s 192.168.25.99 x.x.x.x spark-debug-sar-test8-exec-7 1/1 Completed 0 48s 192.168.25.95 x.x.x.x spark-debug-sar-test8-exec-8 1/1 Completed 0 48s 192.168.25.97 x.x.x.x
Signed-off-by: duyanghao 1294057873@qq.com
What changes were proposed in this pull request?
Add recovery logic for failed pod and fix MEM_EXCEEDED_EXIT_CODE constant.
How was this patch tested?
Manual tests show successful for recovery of failed pod as below:
spark.executor.instances=5