apache-spark-on-k8s / spark

Apache Spark enhanced with native Kubernetes scheduler back-end: NOTE this repository is being ARCHIVED as all new development for the kubernetes scheduler back-end is now on https://github.com/apache/spark/
https://spark.apache.org/
Apache License 2.0
612 stars 118 forks source link

Document how to publish apache-spark-on-k8s releases #438

Open ash211 opened 7 years ago

ash211 commented 7 years ago

cc @erikerlandson

For reference, I use this script to build the Palantir Spark dist locally which might be usable as a starting point:

#!/bin/bash

set -eu

spark_version=$(git describe --tags)
hadoop_version=hadoop-2.8.0-palantir6
docker_base=my.host.example.com/spark

echo cleaning...
./build/mvn clean

echo setting version number for publish...
./build/mvn versions:set -DnewVersion="$spark_version"

echo building jars and publishing locally...
./build/mvn -DskipTests -Phadoop-cloud -Phadoop-palantir -Pkinesis-asl -Pkubernetes -Phive -Pyarn install

exit

echo making distribution...
./dev/make-distribution.sh --name "$hadoop_version" --tgz -Phadoop-cloud -Phadoop-palantir -Pkinesis-asl -Pkubernetes -Phive -Pyarn

echo publishing to local dir...
combined_version="$spark_version"-"$hadoop_version"
mkdir -p ~/.m2/repository/org/apache/spark/spark-dist/$combined_version/
cp spark-dist-$combined_version.tgz ~/.m2/repository/org/apache/spark/spark-dist/$combined_version/

echo building docker images...
tar -zxvf spark-dist-$combined_version.tgz
pushd spark-dist-$combined_version/

for type in driver executor init-container resource-staging-server shuffle-service; do
    docker build -t "$docker_base/$type:$spark_version" -f "dockerfiles/$type/Dockerfile" .
done

echo publishing docker images...
for type in driver executor init-container resource-staging-server shuffle-service; do
    docker push "$docker_base/$type:$spark_version"
done

popd
erikerlandson commented 7 years ago

Thanks @ash211 - For reference and comparison, I built the 2.2 release using the following:

$ ./dev/make-distribution.sh --pip --tgz -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3
$ ./dev/make-distribution.sh --pip --tgz -Pmesos -Pyarn -Pkinesis-asl -Phive -Phive-thriftserver -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3 -Phadoop-provided

I noticed that all but one unit test succeeded using only -Pkubernetes -Phadoop-2.7 -Dhadoop.version=2.7.3.

I was unable to run integration tests out of the box because I use kvm instead of v-box for minikube. However I can probably spin minikube up separately and run them in shared-cluster mode, I need to test.

erikerlandson commented 7 years ago

to build w/out hadoop deps included you have to use -Phadoop-provided which I'm not sure everybody knows. Simply not specifying -Phadoop-x.x defaults it to 2.2, it does not build without.