astronomy-commons / axs

Astronomy eXtensions for Spark: Fast, Scalable, Analytics of Billion+ row catalogs
https://axs.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
23 stars 12 forks source link

Documentation for build with newer versions of Spark #20

Open Dav-v opened 4 years ago

Dav-v commented 4 years ago

Hi all, maybe this would be more appropriate on the axs-spark repo, but it's not possible to open issues there so I'm posting here. I would like to install AXS on a standalone cluster with a more recent version of Spark (2.4.5 or even 3.0-preview), is there a documentation explaining how to prepare the distribution? I noticed that axs-spark has some branches for spark version like 2.3.0 , 2.4.3 and 3.0 preview, but there is an AXS release only with spark 2.4.0. I see that @stevenstetzler is testing an automatic pipeline to create an AXS distribution with spark 3.0-preview, would it be possible to do it manually while it is not ready? Thanks a lot, Davide Viero

stevenstetzler commented 4 years ago

Yes, I'll put the instructions here and we should add documentation on the website on how to build from source as well.

Building AXS is essentially merging a few pieces of AXS into Spark before building Spark from source. You can find documentation on building Spark from source here: https://spark.apache.org/docs/latest/building-spark.html. You'll need to download and install maven, which lets you compile Java projects from source including their dependencies. From the Spark documentation: "Building Spark using Maven requires Maven 3.5.4 and Java 8"

First, clone this repository and checkout the branch/tag/commit you want

git clone https://github.com/astronomy-commons/axs.git
cd axs
git checkout master

next do the same with the axs-spark repository

git clone https://github.com/astronomy-commons/axs-spark
cd axs-spark
git checkout axs-3.0.0-preview

1) Build AxsUtilities.jar

cd axs/AxsUtilities
mvn package # runs maven to compile the AxsUtilities project, pom.xml sets configuration for build

created jar will be in axs/AxsUtilities/target. 2) Merge axs and Spark

cp -r ./axs/axs ./axs-spark/python/. # adds python components of axs to Spark's PYTHONPATH
cp -r ./axs/AxsUtilities/target/*.jar ./axs-spark/python/axs/. # adds compiled  AXS Jar for use in Spark

3) Build Spark from source

cd axs-spark
./dev/make-distribution.sh --name AXS-Custom-Build --tgz -Phadoop-2.7 -Pmesos -Pyarn -Phive -Phive-thriftserver -Pkubernetes

This will build Spark from source and produce a tar file (--tgz) called spark-3.0.0-preview.tgz or something like that. -Phadoop-2.7 specifies to build Spark along with Hadoop 2.7 binaries. Hadoop can be an external library on your system as well. -Pmesos -Pyarn -Pkubernetes says to build Spark with scheduling support for Mesos, Yars, and Kubernetes. -Phive -Phive-thriftserver enables support for using Hive, which AXS depends on for storage of catalog metadata.

stevenstetzler commented 4 years ago

Also, if you don't want to go through building from source, these distributions should have Spark 3.0.0 support:

From @ctslater : https://epyc.astro.washington.edu/~ctslater/axs-spark-3.0.0-preview-axsdistfix.tar.gz From one of our preliminary automated builds: https://github.com/stevenstetzler/axs/releases/download/v3.0.0-preview/axs-distribution.tgz

Dav-v commented 4 years ago

Yes, I'll put the instructions here and we should add documentation on the website on how to build from source as well.

Building AXS is essentially merging a few pieces of AXS into Spark before building Spark from source. You can find documentation on building Spark from source here: https://spark.apache.org/docs/latest/building-spark.html. You'll need to download and install maven, which lets you compile Java projects from source including their dependencies. From the Spark documentation: "Building Spark using Maven requires Maven 3.5.4 and Java 8"

First, clone this repository and checkout the branch/tag/commit you want

git clone https://github.com/astronomy-commons/axs.git
cd axs
git checkout master

next do the same with the axs-spark repository

git clone https://github.com/astronomy-commons/axs-spark
cd axs-spark
git checkout axs-3.0.0-preview
1. Build AxsUtilities.jar
cd axs/AxsUtilities
mvn package # runs maven to compile the AxsUtilities project, pom.xml sets configuration for build

created jar will be in axs/AxsUtilities/target. 2) Merge axs and Spark

cp -r ./axs/axs ./axs-spark/python/. # adds python components of axs to Spark's PYTHONPATH
cp -r ./axs/AxsUtilities/target/*.jar ./axs-spark/python/axs/. # adds compiled  AXS Jar for use in Spark
1. Build Spark from source
cd axs-spark
./dev/make-distribution.sh --name AXS-Custom-Build --tgz -Phadoop-2.7 -Pmesos -Pyarn -Phive -Phive-thriftserver -Pkubernetes

This will build Spark from source and produce a tar file (--tgz) called spark-3.0.0-preview.tgz or something like that. -Phadoop-2.7 specifies to build Spark along with Hadoop 2.7 binaries. Hadoop can be an external library on your system as well. -Pmesos -Pyarn -Pkubernetes says to build Spark with scheduling support for Mesos, Yars, and Kubernetes. -Phive -Phive-thriftserver enables support for using Hive, which AXS depends on for storage of catalog metadata.

Great, thanks a lot. It would be very useful indeed to put this on the AXS documentation pages, also because it took a while for me to find out about the existence of axs-spark, since it is not mentioned in the documentation nor in the README. The repository axs-common/axs is more visible and easier to find on Google than axs-common/axs-spark, so it could be nice to explain in the documentation the relationship between them for future users.

Also, if you don't want to go through building from source, these distributions should have Spark 3.0.0 support:

From @ctslater : https://epyc.astro.washington.edu/~ctslater/axs-spark-3.0.0-preview-axsdistfix.tar.gz From one of our preliminary automated builds: https://github.com/stevenstetzler/axs/releases/download/v3.0.0-preview/axs-distribution.tgz

Thanks, I'll install this version then

stargaser commented 4 years ago

Thanks very much @stevenstetzler for posting these build instructions. I've successfully built the 3.0.0 preview at IPAC.

Two minor hiccups: The build did not work with Java 11 but it did work with Java 8. Building the yarn and mesos parts was failing, until I copied some certificates from an existing Java 8 to the openjdk that I was using with Maven.