Closed liyinan926 closed 6 years ago
rerun integration tests please
I want to re-iterate on this issue/PR. If we have concern around losing some objects like the ConfigMap for setting up the init-container, as I said above, we could log information stored in it for debugging purpose. This, IMO, is better than making the ConfigMap stick around just for debugging. Thoughts?
@mccheah @foxish
Any thoughts on this ? It would be good to cleanup resources after completion. In a normal scenario, this is filling up a lot of services in completed state for example.
Thanks @felixcheung for jumping on this :)
hey where are we on this? and how about going upstream?
@felixcheung Yes, I think we should go upstream. I created https://issues.apache.org/jira/browse/SPARK-23571. Also given that we are in the process of getting rid of the init-container, the ConfigMap for the init-container will be gone also. So it makes more sense to clean up after application completion.
Sorry, didn't see this before. Same comment as in https://github.com/apache/spark/pull/20722#discussion_r171968410. Why not do this during driver.stop()? - that way, 1) if we lose the driver, k8s garbage collection cleans up everything 2) if driver terminates, we clean up executors as well as auxiliary resources like configmaps etc.
I want to re-iterate on this issue/PR. If we have concern around losing some objects like the ConfigMap for setting up the init-container, as I said above, we could log information stored in it for debugging purpose. This, IMO, is better than making the ConfigMap stick around just for debugging. Thoughts?
I agree. We can dump all k8s objects. My hunch is that it's not that useful, given it's a pretty deeply buried implementation detail.
As discussed in https://github.com/apache/spark/pull/20722, we think the right solution is move resource management into the driver pod. This way, cleanup of auxiliary resources upon completion is guaranteed regardless of which deployment mode is used and whether the client waits for application to complete or not.
What changes were proposed in this pull request?
This PR fixes #519 for the case where the submission client waits for the submitted application to finish. Upon completion of the application, the submission client deletes all Kubernetes resources created for the application to run.