apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Cutting the Spark 2.2 release #398

Closed foxish closed 7 years ago

foxish commented 7 years ago

We can cut this and mark it as beta for now. I don't think there is a need to update documentation yet however to point to 2.2. This is a prerequisite to announcing upstream. Steps:

  1. [x] Build a distribution with and without hadoop
  2. [x] Tag the commit, draft a github release, and upload the tar.gz files in (1)
  3. [x] Build and upload the images (appropriately tagged) to the kubespark repo
  4. [x] Fill in the release description on GitHub

@apache-spark-on-k8s/contributors

"blocking" PRs:

  1. [x] https://github.com/apache-spark-on-k8s/userdocs/pull/8
  2. [x] https://github.com/apache-spark-on-k8s/spark/pull/407
  3. [x] https://github.com/apache-spark-on-k8s/spark/pull/404
  4. [x] https://github.com/apache-spark-on-k8s/spark/pull/401
  5. [x] https://github.com/apache-spark-on-k8s/spark/pull/424
  6. [x] https://github.com/apache-spark-on-k8s/spark/pull/412
erikerlandson commented 7 years ago

I'm eager to announce, but I'm expecting the python support to be a popular feature, and I'm still hoping for at least a quick-start example for people to refer to. Otherwise they'll have nothing to refer to, IIUC.

foxish commented 7 years ago

+1, @ifilonenko, could you please prioritize the docs?

erikerlandson commented 7 years ago

@foxish the docs look good to me: https://github.com/apache-spark-on-k8s/userdocs/pull/8 If #407 passes, we should include that.

erikerlandson commented 7 years ago

I'd also like to merge #404 before cutting, unless there are any objections.

foxish commented 7 years ago

+1, makes sense to me @erikerlandson. Let's merge those and get out the release. We might have to cut another 2.1 as well because of the recently discovered pyspark bug.

erikerlandson commented 7 years ago

@liyinan926 for the sake of consistency, can you paste the dev/make-distribution and build/mvn commands you used to do the latest 2.1 build here? I'd like to make sure I'm using the same build options, to the extent possible

erikerlandson commented 7 years ago

Recording my build commands here.

dist tarball:

$ ./dev/make-distribution.sh --pip --tgz -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3
$ ./dev/make-distribution.sh --pip --tgz -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3 -Phadoop-provided

image builds (from spark-2.2.0-k8s-0.3.0-hadoop-2.7.3.tgz):

$ sed -i 's/FROM spark-base/FROM kubespark\/spark-base:v2.2.0-kubernetes-0.3.0/' dockerfiles/*/Dockerfile
$ docker build -t kubespark/spark-base:v2.2.0-kubernetes-0.3.0 -f dockerfiles/spark-base/Dockerfile .
$ docker build -t kubespark/spark-driver:v2.2.0-kubernetes-0.3.0 -f dockerfiles/driver/Dockerfile .
$ docker build -t kubespark/driver-py:v2.2.0-kubernetes-0.3.0 -f dockerfiles/driver-py/Dockerfile .
$ docker build -t kubespark/spark-executor:v2.2.0-kubernetes-0.3.0 -f dockerfiles/executor/Dockerfile .
$ docker build -t kubespark/executor-py:v2.2.0-kubernetes-0.3.0 -f dockerfiles/executor-py/Dockerfile .
$ docker build -t kubespark/spark-init:v2.2.0-kubernetes-0.3.0 -f dockerfiles/init-container/Dockerfile .
$ docker build -t kubespark/spark-resource-staging-server:v2.2.0-kubernetes-0.3.0 -f dockerfiles/resource-staging-server/Dockerfile .
$ docker build -t kubespark/spark-shuffle:v2.2.0-kubernetes-0.3.0 -f dockerfiles/shuffle-service/Dockerfile .
liyinan926 commented 7 years ago

@erikerlandson I used the following commands:

dev/make-distribution.sh --tgz -Phadoop-2.7 -Pkubernetes
dev/make-distribution.sh --tgz -Pkubernetes
erikerlandson commented 7 years ago

@liyinan926 thx - my reading of the doc says that not providing the -Phadoop-x.x just defaults it to hadoop-2.2.0, should we be using -Phadoop-provided for the "no-hadoop" version?

Also, are you building the docker images from the (unpacked) tarballs?

cc @foxish

foxish commented 7 years ago

Good catch @erikerlandson. I had missed that in the documentation as well. We should use the -Phadoop-provided like you said.

liyinan926 commented 7 years ago

Yes, I used the unpacked tarball (with hadoop) to build and push the images.

liyinan926 commented 7 years ago

@foxish @erikerlandson I will rebuild the no-hadoop distribution and upload it with -Phadoop-provided.

erikerlandson commented 7 years ago

@apache-spark-on-k8s/contributors I have cut tarballs and images for v2.2.0-kubernetes-0.3.0 - if somebody can smoke test these as a sanity check it would be great!

erikerlandson commented 7 years ago

2.2 is cut and announced