apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Delete Kubernetes resources when the client waits for and sees app completion #520

Closed liyinan926 closed 6 years ago

liyinan926 commented 7 years ago

What changes were proposed in this pull request?

This PR fixes #519 for the case where the submission client waits for the submitted application to finish. Upon completion of the application, the submission client deletes all Kubernetes resources created for the application to run.

liyinan926 commented 6 years ago

rerun integration tests please

liyinan926 commented 6 years ago

I want to re-iterate on this issue/PR. If we have concern around losing some objects like the ConfigMap for setting up the init-container, as I said above, we could log information stored in it for debugging purpose. This, IMO, is better than making the ConfigMap stick around just for debugging. Thoughts?

@mccheah @foxish

dharmeshkakadia commented 6 years ago

Any thoughts on this ? It would be good to cleanup resources after completion. In a normal scenario, this is filling up a lot of services in completed state for example.

dharmeshkakadia commented 6 years ago

Thanks @felixcheung for jumping on this :)

felixcheung commented 6 years ago

hey where are we on this? and how about going upstream?

liyinan926 commented 6 years ago

@felixcheung Yes, I think we should go upstream. I created https://issues.apache.org/jira/browse/SPARK-23571. Also given that we are in the process of getting rid of the init-container, the ConfigMap for the init-container will be gone also. So it makes more sense to clean up after application completion.

foxish commented 6 years ago

Sorry, didn't see this before. Same comment as in https://github.com/apache/spark/pull/20722#discussion_r171968410. Why not do this during driver.stop()? - that way, 1) if we lose the driver, k8s garbage collection cleans up everything 2) if driver terminates, we clean up executors as well as auxiliary resources like configmaps etc.

foxish commented 6 years ago

I want to re-iterate on this issue/PR. If we have concern around losing some objects like the ConfigMap for setting up the init-container, as I said above, we could log information stored in it for debugging purpose. This, IMO, is better than making the ConfigMap stick around just for debugging. Thoughts?

I agree. We can dump all k8s objects. My hunch is that it's not that useful, given it's a pretty deeply buried implementation detail.

liyinan926 commented 6 years ago

As discussed in https://github.com/apache/spark/pull/20722, we think the right solution is move resource management into the driver pod. This way, cleanup of auxiliary resources upon completion is guaranteed regardless of which deployment mode is used and whether the client waits for application to complete or not.