apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Spark on Kubernetes - basic submission client #545

Closed liyinan926 closed 6 years ago

liyinan926 commented 6 years ago

Second draft upstreaming PR that contains the basic submission client implementation and unit tests. Branch spark-kubernetes-3-updated is a clone of spark-kubernetes-3 with latest changes from upstream/master merged in. spark-kubernetes-4 includes all our changes in spark-kubernetes-3.

cc @foxish @mccheah @apache-spark-on-k8s/contributors

foxish commented 6 years ago

This will be the follow-up to https://github.com/apache/spark/pull/19468

mccheah commented 6 years ago

This seems like a large diff, but a quick scan shows everything included as necessary. We need the driver service bootstrap because of changes to master. I think we can reduce the fanciness of the credentials step but that doesn't reduce the complexity by a significant amount.

foxish commented 6 years ago

One TODO: Add the unit test in https://github.com/apache-spark-on-k8s/spark/pull/542 to this PR

liyinan926 commented 6 years ago

Changed from #542 merged in.

mccheah commented 6 years ago

A few comments but otherwise this captures the spirit of what we want to have upstream.

liyinan926 commented 6 years ago

If there's no objection, I will squash the commits and push to upstream for review by EOD today. @apache-spark-on-k8s/contributors

foxish commented 6 years ago

SGTM! We should see how we can make it less confusing for reviewers - because this PR encompasses changes in spark-kubernetes-3.

liyinan926 commented 6 years ago

When pushing upstream, I'm gonna remove code for the first PR so this is less confusing.

kimoonkim commented 6 years ago

The latest commit seems to address my comments so far. Thanks!

liyinan926 commented 6 years ago

Squashed the commit and removed scheduler backend code and relevant changes in Yarn-related code.

liyinan926 commented 6 years ago

@kimoonkim any more comments on the submission steps?

liyinan926 commented 6 years ago

This is under review at https://github.com/apache/spark/pull/19717.

foxish commented 6 years ago

@ifilonenko @erikerlandson @mccheah Can you please help with integration tests here? We need signals on what's working and what's not working. We expect some integration tests to fail, but most should pass. @liyinan926 just informed me that we were missing some critical functionality with local:// file support. We want to ensure issues like that don't creep into our upstream PR.

foxish commented 6 years ago

cc @kimoonkim

liyinan926 commented 6 years ago

To elaborate, we are missing DependencyResolutionStep in this PR. Without it, users won't even be able to use local:// dependencies.

erikerlandson commented 6 years ago

we may be having a problem here if the docker poms were elided - the integration testing can't build images to run its tests, IIUC

foxish commented 6 years ago

Tracking in https://github.com/apache-spark-on-k8s/spark/issues/568. @erikerlandson, you're right. We need to separate out the integration testing now, so we can actually run it against any distribution - irrespective of whether it has the dockerfile target or not.

liyinan926 commented 6 years ago

It seems that the LocalDirectoryMountConfigurationStep is also essential. @mccheah @foxish @kimoonkim @erikerlandson @ifilonenko to confirm. If yes, I will add it to this PR.

liyinan926 commented 6 years ago

The PR has been merged upstream. Closing this.