apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

driver pod not handle some executor pod fail #600

Open ChenLingPeng opened 6 years ago

ChenLingPeng commented 6 years ago

If an executor pod failed before register to the driver, the driver can't aware these executors and can't create a new executor.

e.g. the pic below shows an exception before register to the driver, which is raised by CoarseGrainedExecutorBackend.askAsync image

the driver expects the executor to register but not handle the failed executor, the log below shows that

image

the reason is that when watch executor, we have not handle the failed executor event which is Action.MODIFY and in Failed phase.

I will pull a request to fix this