cindysz110 / blog

8 stars 1 forks source link

[Spark] Spark源码编译安装 #24

Open cindygl opened 6 years ago

cindygl commented 6 years ago

环境: 操作系统:CentOS7.3 Maven:maven-3.5.4 JDK:jdk-1.8.0_45 Scala:2.11.12 备注:本文编译和安装spark全过程使用hadoop用户,除标注了切换root用户的地方以外,其他均是hadoop用户操作。

1. 下载Spark源码

进入spark官网,选择最新版本的spark,包类型选择source code,右键复制链接地址后去服务器上下载 image

[hadoop@hadoop01 source]$ wget https://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0.tgz

2. 编译spark源码

官方文档:Spark官方文档 - More - Building Spark image

image

官方文档要求,编译spark之前环境要求: Maven版本:Maven-3.3.9 or newer Java版本:Java 8+ Scala:2.10+

2.1 安装maven

# 下载
[hadoop@hadoop01 software]$ wget http://mirror.stjschools.org/public/apache/maven/maven-3/3.5.4/binaries/apache-maven-3.5.4-bin.tar.gz

# 解压
[hadoop@hadoop01 software]$ tar -zxvf apache-maven-3.5.4-bin.tar.gz -C ~/app/

# 配置环境变量(切换root用户)
[root@hadoop01 ~]# vi /etc/profile
export MAVEN_HOME=/home/hadoop/app/apache-maven-3.5.4
export PATH=$MAVEN_HOME/bin:$PATH
[root@hadoop01 ~]# source /etc/profile

# 验证安装成功
[root@hadoop01 ~]# mvn --version
Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-18T02:33:14+08:00)
Maven home: /home/hadoop/app/apache-maven-3.5.4
Java version: 1.8.0_45, vendor: Oracle Corporation, runtime: /usr/java/jdk1.8.0_45/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-514.el7.x86_64", arch: "amd64", family: "unix"
[root@hadoop01 ~]# 

# 修改maven本地仓库路径(存放maven依赖)
[hadoop@hadoop01 ~]$ vim app/apache-maven-3.5.4/conf/settings.xml
<localRepository>/home/hadoop/maven_repo</localRepository>

2.2 安装jdk8+

# 卸载操作系统自带的jdk

# 安装jdk8
[root@hadoop01 ~]# mkdir /usr/java
[root@hadoop01 ~]# cp /opt/software/jdk-8u45-linux-x64.gz /usr/java/
[root@hadoop01 ~]# cd /usr/java/
[root@hadoop01 java]# tar -zxvf jdk-8u45-linux-x64.gz
[root@hadoop01 java]# chown -R root:root jdk1.8.0_45/

# 配置环境变量(切换root用户)
[root@hadoop01 java]# vim /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_45/
export PATH=$PATH:$JAVA_HOME/bin
[root@hadoop01 java]# source /etc/profile

# 验证安装成功
[hadoop@hadoop01 software]$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
[hadoop@hadoop01 software]$ echo $JAVA_HOME
/usr/java/jdk1.8.0_45
[hadoop@hadoop01 software]$

2.3 安装Scala

# 下载
[hadoop@hadoop01 software]$ wget https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz

# 解压
[hadoop@hadoop01 software]$ tar -zxvf scala-2.11.12.tgz -C ~/app/

# 配置环境变量(切换root用户)
[root@hadoop01 software]# vi /etc/profile
export SCALA_HOME=/home/hadoop/app/scala-2.11.12
export PATH=$SCALA_HOME/bin:$PATH
[root@hadoop01 software]# source /etc/profile

# 验证安装成功
[hadoop@hadoop01 ~]# scala
Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45).
Type in expressions for evaluation. Or try :help.

scala> :q
[hadoop@hadoop01 ~]#

2.4 安装git(make-distribution编译时可能需要用到)(切换root用户)

[root@hadoop01 ~]# yum -y install git

2.5 编译Spark

# 源码编译的方式(官方文档)
方式一:maven编译,这种方式编译后不会产生一共tgz包,适合开发人员使用(这种编译方式要先配置MAVEN_OPTS)
[hadoop@hadoop01 source]$ tar -zxvf spark-2.2.0.tgz
[hadoop@hadoop01 source]$ cd spark-2.2.0/
[hadoop@hadoop01 source]$ export MAVEN_OPTS=              # 也可以写在/etc/profile中,一劳永逸
[hadoop@hadoop01 source]$ mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -DskipTests clean package   

方式二:make-distribution编译,这种方式会产生一个tgz包,拷贝到产环境安装。spark根目录下的make-distribution.sh已经封装好了方法一中maven的命令,以及MAVEN_OPTS的配置,所以编译时只需要把hadoop和yarn相关参数传进去即可。本次我们选择使用这种方式来编译spark。

# 步骤1:修改/dev/make-distribution.sh,将maven检查的步骤全部注释掉,并且增加变量配置各软件的版本号(修改以上配置是为了让spark编译过程跳过一些版本检查,加快编译速度)
[hadoop@hadoop01 source]$ tar -zxvf spark-2.2.0.tgz
[hadoop@hadoop01 source]$ cd spark-2.2.0/
[hadoop@hadoop01 spark-2.2.0]$ vi dev/make-distribution.sh
注释掉下面的内容
# VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1)
# SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | tail -n 1)
# SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | tail -n 1)
# SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\
#     | grep -v "INFO"\
#     | fgrep --count "<id>hive</id>";\
#     # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\
#     # because we use "set -o pipefail"
#     echo -n)
并且在注释内容的下方添加下面的版本号
VERSION=2.2.0
SCALA_VERSION=2.11
SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0
SPARK_HIVE=1   # 表示支持hive

# 步骤2:修改pom.xml文件,添加clouderad repos(默认使用apache官方的仓库)
[hadoop@hadoop01 spark-2.2.0]$  vi pom.xml
在<repositories>标签下添加
    <repository>
      <id>cloudera</id>
      <name>Cloudera Repository</name>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
    </repository>

# 步骤3:编译spark
[hadoop@hadoop01 spark-2.2.0]$ ./dev/make-distribution.sh \
> --name 2.6.0-cdh5.7.0 \
> --tgz \
> -Dhadoop.version=2.6.0-cdh5.7.0 \
> -Dhadoop2.6 \
> -Phive -Phive-thriftserver \
> -Pyarn
...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM 2.2.0 ..................... SUCCESS [04:06 min]
[INFO] Spark Project Tags ................................. SUCCESS [ 25.359 s]
[INFO] Spark Project Sketch ............................... SUCCESS [  4.362 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 33.652 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  5.041 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 29.857 s]
[INFO] Spark Project Launcher ............................. SUCCESS [01:18 min]
[INFO] Spark Project Core ................................. SUCCESS [02:15 min]
[INFO] Spark Project ML Local Library ..................... SUCCESS [ 30.105 s]
[INFO] Spark Project GraphX ............................... SUCCESS [ 15.674 s]
[INFO] Spark Project Streaming ............................ SUCCESS [ 30.125 s]
[INFO] Spark Project Catalyst ............................. SUCCESS [01:17 min]
[INFO] Spark Project SQL .................................. SUCCESS [01:49 min]
[INFO] Spark Project ML Library ........................... SUCCESS [01:10 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 24.995 s]
[INFO] Spark Project Hive ................................. SUCCESS [ 39.965 s]
[INFO] Spark Project REPL ................................. SUCCESS [  4.626 s]
[INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 41.294 s]
[INFO] Spark Project YARN ................................. SUCCESS [ 47.090 s]
[INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 18.700 s]
[INFO] Spark Project Assembly ............................. SUCCESS [  2.468 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 33.714 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 10.476 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [  2.342 s]
[INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 17.229 s]
[INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 10.172 s]
[INFO] Spark Project Examples ............................. SUCCESS [ 15.039 s]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [  3.564 s]
[INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 16.415 s]
[INFO] Spark Integration for Kafka 0.10 Assembly 2.2.0 .... SUCCESS [  4.040 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 12:40 min (Wall Clock)
[INFO] Finished at: 2018-07-05T10:40:09+08:00
[INFO] ------------------------------------------------------------------------
+ rm -rf /home/hadoop/sourcecode/spark-2.2.0/dist
+ mkdir -p /home/hadoop/sourcecode/spark-2.2.0/dist/jars
+ echo 'Spark 2.2.0 built for Hadoop 2.6.0-cdh5.7.0'
+ echo 'Build flags: -Dhadoop.version=2.6.0-cdh5.7.0' -Dhadoop2.6 -Phive -Phive-thriftserver -Pyarn
+ cp /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/activation-1.1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/antlr-2.7.7.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/antlr4-runtime-4.5.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/antlr-runtime-3.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aopalliance-1.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aopalliance-repackaged-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/apacheds-i18n-2.0.0-M15.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/apacheds-kerberos-codec-2.0.0-M15.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/apache-log4j-extras-1.2.17.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/api-asn1-api-1.0.0-M20.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/api-util-1.0.0-M20.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/arpack_combined_all-0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/avro-1.7.7.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/avro-ipc-1.7.7.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/avro-mapred-1.7.7-hadoop2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aws-java-sdk-core-1.10.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aws-java-sdk-kms-1.10.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aws-java-sdk-s3-1.10.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/base64-2.3.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/bcprov-jdk15on-1.51.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/bonecp-0.8.0.RELEASE.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/breeze_2.11-0.13.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/breeze-macros_2.11-0.13.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/calcite-avatica-1.2.0-incubating.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/calcite-core-1.2.0-incubating.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/calcite-linq4j-1.2.0-incubating.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/chill_2.11-0.8.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/chill-java-0.8.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-beanutils-1.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-beanutils-core-1.8.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-cli-1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-codec-1.10.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-collections-3.2.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-compiler-3.0.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-compress-1.4.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-configuration-1.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-crypto-1.0.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-dbcp-1.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-digester-1.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-httpclient-3.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-io-2.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-lang-2.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-lang3-3.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-logging-1.1.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-math3-3.4.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-net-2.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-pool-1.5.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/compress-lzf-1.0.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/core-1.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/curator-client-2.6.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/curator-framework-2.6.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/curator-recipes-2.6.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/datanucleus-api-jdo-3.2.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/datanucleus-core-3.2.10.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/datanucleus-rdbms-3.2.9.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/derby-10.12.1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/eigenbase-properties-1.1.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/gson-2.2.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/guava-14.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/guice-3.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/guice-servlet-3.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-annotations-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-auth-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-aws-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-client-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-common-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-hdfs-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-app-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-jobclient-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-shuffle-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-api-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-client-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-common-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-server-common-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-server-web-proxy-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-beeline-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-cli-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-exec-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-jdbc-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-metastore-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hk2-api-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hk2-locator-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hk2-utils-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/htrace-core4-4.0.1-incubating.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/httpclient-4.5.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/httpcore-4.4.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/ivy-2.4.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-annotations-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-core-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-core-asl-1.9.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-databind-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-jaxrs-1.9.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-mapper-asl-1.9.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-module-paranamer-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-module-scala_2.11-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-xc-1.9.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/janino-3.0.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/JavaEWAH-0.3.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javassist-3.18.1-GA.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.annotation-api-1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.inject-1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.inject-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/java-xmlbuilder-1.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.servlet-api-3.1.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.ws.rs-api-2.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javolution-5.5.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jaxb-api-2.2.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jcl-over-slf4j-1.7.16.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jdo-api-3.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-client-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-common-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-container-servlet-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-container-servlet-core-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-guava-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-media-jaxb-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-server-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jets3t-0.9.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jetty-6.1.26.cloudera.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jetty-util-6.1.26.cloudera.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jline-2.12.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/joda-time-2.9.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jodd-core-3.5.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jpam-1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/json4s-ast_2.11-3.2.11.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/json4s-core_2.11-3.2.11.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/json4s-jackson_2.11-3.2.11.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jsr305-1.3.9.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jta-1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jtransforms-2.4.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jul-to-slf4j-1.7.16.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/kryo-shaded-3.0.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/leveldbjni-all-1.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/libfb303-0.9.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/libthrift-0.9.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/log4j-1.2.17.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/lz4-1.3.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/machinist_2.11-0.6.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/macro-compat_2.11-1.1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/mail-1.4.7.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/metrics-core-3.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/metrics-graphite-3.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/metrics-json-3.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/metrics-jvm-3.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/minlog-1.3.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/mx4j-3.0.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/netty-3.9.9.Final.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/netty-all-4.0.43.Final.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/objenesis-2.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/opencsv-2.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/oro-2.0.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/osgi-resource-locator-1.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/paranamer-2.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-column-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-common-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-encoding-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-format-2.3.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-hadoop-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-hadoop-bundle-1.6.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-jackson-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/pmml-model-1.2.15.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/pmml-schema-1.2.15.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/protobuf-java-2.5.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/py4j-0.10.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/pyrolite-4.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/RoaringBitmap-0.5.11.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-compiler-2.11.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-library-2.11.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scalap-2.11.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-parser-combinators_2.11-1.0.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-reflect-2.11.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-xml_2.11-1.0.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/shapeless_2.11-2.3.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/slf4j-api-1.7.16.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/slf4j-log4j12-1.7.16.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/snappy-0.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/snappy-java-1.1.2.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-catalyst_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-core_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-graphx_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-hive_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-hive-thriftserver_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-launcher_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-mllib_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-mllib-local_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-network-common_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-network-shuffle_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-repl_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-sketch_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-sql_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-streaming_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-tags_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-unsafe_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-yarn_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spire_2.11-0.13.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spire-macros_2.11-0.13.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/ST4-4.0.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/stax-api-1.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/stax-api-1.0-2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/stream-2.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/stringtemplate-3.2.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/super-csv-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/univocity-parsers-2.2.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/validation-api-1.1.0.Final.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/xbean-asm5-shaded-4.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/xercesImpl-2.9.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/xmlenc-0.52.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/xz-1.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/zookeeper-3.4.6.jar /home/hadoop/sourcecode/spark-2.2.0/dist/jars/
+ '[' -f /home/hadoop/sourcecode/spark-2.2.0/common/network-yarn/target/scala-2.11/spark-2.2.0-yarn-shuffle.jar ']'
+ mkdir /home/hadoop/sourcecode/spark-2.2.0/dist/yarn
+ cp /home/hadoop/sourcecode/spark-2.2.0/common/network-yarn/target/scala-2.11/spark-2.2.0-yarn-shuffle.jar /home/hadoop/sourcecode/spark-2.2.0/dist/yarn
+ mkdir -p /home/hadoop/sourcecode/spark-2.2.0/dist/examples/jars
+ cp /home/hadoop/sourcecode/spark-2.2.0/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar /home/hadoop/sourcecode/spark-2.2.0/examples/target/scala-2.11/jars/spark-examples_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/dist/examples/jars
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /home/hadoop/sourcecode/spark-2.2.0/dist/examples/jars/scopt_2.11-3.3.0.jar
+ name=scopt_2.11-3.3.0.jar
+ '[' -f /home/hadoop/sourcecode/spark-2.2.0/dist/jars/scopt_2.11-3.3.0.jar ']'
+ for f in '"$DISTDIR"/examples/jars/*'
++ basename /home/hadoop/sourcecode/spark-2.2.0/dist/examples/jars/spark-examples_2.11-2.2.0.jar
+ name=spark-examples_2.11-2.2.0.jar
+ '[' -f /home/hadoop/sourcecode/spark-2.2.0/dist/jars/spark-examples_2.11-2.2.0.jar ']'
+ mkdir -p /home/hadoop/sourcecode/spark-2.2.0/dist/examples/src/main
+ cp -r /home/hadoop/sourcecode/spark-2.2.0/examples/src/main /home/hadoop/sourcecode/spark-2.2.0/dist/examples/src/
+ cp /home/hadoop/sourcecode/spark-2.2.0/LICENSE /home/hadoop/sourcecode/spark-2.2.0/dist
+ cp -r /home/hadoop/sourcecode/spark-2.2.0/licenses /home/hadoop/sourcecode/spark-2.2.0/dist
+ cp /home/hadoop/sourcecode/spark-2.2.0/NOTICE /home/hadoop/sourcecode/spark-2.2.0/dist
+ '[' -e /home/hadoop/sourcecode/spark-2.2.0/CHANGES.txt ']'
+ cp -r /home/hadoop/sourcecode/spark-2.2.0/data /home/hadoop/sourcecode/spark-2.2.0/dist
+ '[' false == true ']'
+ echo 'Skipping building python distribution package'
Skipping building python distribution package
+ '[' false == true ']'
+ echo 'Skipping building R source package'
Skipping building R source package
+ mkdir /home/hadoop/sourcecode/spark-2.2.0/dist/conf
+ cp /home/hadoop/sourcecode/spark-2.2.0/conf/docker.properties.template /home/hadoop/sourcecode/spark-2.2.0/conf/fairscheduler.xml.template /home/hadoop/sourcecode/spark-2.2.0/conf/log4j.properties.template /home/hadoop/sourcecode/spark-2.2.0/conf/metrics.properties.template /home/hadoop/sourcecode/spark-2.2.0/conf/slaves.template /home/hadoop/sourcecode/spark-2.2.0/conf/spark-defaults.conf.template /home/hadoop/sourcecode/spark-2.2.0/conf/spark-env.sh.template /home/hadoop/sourcecode/spark-2.2.0/dist/conf
+ cp /home/hadoop/sourcecode/spark-2.2.0/README.md /home/hadoop/sourcecode/spark-2.2.0/dist
+ cp -r /home/hadoop/sourcecode/spark-2.2.0/bin /home/hadoop/sourcecode/spark-2.2.0/dist
+ cp -r /home/hadoop/sourcecode/spark-2.2.0/python /home/hadoop/sourcecode/spark-2.2.0/dist
+ '[' false == true ']'
+ cp -r /home/hadoop/sourcecode/spark-2.2.0/sbin /home/hadoop/sourcecode/spark-2.2.0/dist
+ '[' -d /home/hadoop/sourcecode/spark-2.2.0/R/lib/SparkR ']'
+ '[' true == true ']'
+ TARDIR_NAME=spark-2.2.0-bin-2.6.0-cdh5.7.0
+ TARDIR=/home/hadoop/sourcecode/spark-2.2.0/spark-2.2.0-bin-2.6.0-cdh5.7.0
+ rm -rf /home/hadoop/sourcecode/spark-2.2.0/spark-2.2.0-bin-2.6.0-cdh5.7.0
+ cp -r /home/hadoop/sourcecode/spark-2.2.0/dist /home/hadoop/sourcecode/spark-2.2.0/spark-2.2.0-bin-2.6.0-cdh5.7.0
+ tar czf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz -C /home/hadoop/sourcecode/spark-2.2.0 spark-2.2.0-bin-2.6.0-cdh5.7.0
+ rm -rf /home/hadoop/sourcecode/spark-2.2.0/spark-2.2.0-bin-2.6.0-cdh5.7.0
[hadoop@hadoop01 spark-2.2.0]$

# 编译完成在spark根目录可以看到一个tgz包
[hadoop@hadoop01 spark-2.2.0]$ ls -ltr spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz 
-rw-rw-r-- 1 hadoop hadoop 199149808 Jul  5 10:40 spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz
[hadoop@hadoop01 spark-2.2.0]$ 

3. 安装Spark

一般我们会在本地环境把spark编译完成,之后拷贝到生产服务器去解压安装。

# 拷贝tgz包到生产服务器并解压
[hadoop@hadoop01 software]$ tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz -C ~/app/

# 配置环境变量(切换root用户)
[root@hadoop01 ~]# vi /etc/profile
export SPARK_HOME=/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0
export PATH=$SPARK_HOME/bin:$PATH
[root@hadoop01 ~]# source /etc/profile

# 验证安装成功
[hadoop@hadoop01 ~]$ spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/07/10 02:25:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/07/10 02:25:48 ERROR ObjectStore: Version information found in metastore differs 1.1.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version.
18/07/10 02:25:49 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://10.132.37.38:4040
Spark context available as 'sc' (master = local[*], app id = local-1531160743401).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

# Spark目录说明
[hadoop@hadoop01 software]$ cd ~/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/
[hadoop@hadoop01 spark-2.2.0-bin-2.6.0-cdh5.7.0]$ ls -ltr
total 104
drwxrwxr-x 2 hadoop hadoop  4096 Jul  5 10:40 yarn       # 存放yarn相关的jar
-rw-r--r-- 1 hadoop hadoop  3809 Jul  5 10:40 README.md
-rw-r--r-- 1 hadoop hadoop 24645 Jul  5 10:40 NOTICE
drwxr-xr-x 2 hadoop hadoop  4096 Jul  5 10:40 licenses
-rw-r--r-- 1 hadoop hadoop 17881 Jul  5 10:40 LICENSE
drwxrwxr-x 4 hadoop hadoop  4096 Jul  5 10:40 examples   # 存放Spark自带的测试用例 *****重点看
drwxr-xr-x 5 hadoop hadoop  4096 Jul  5 10:40 data       # 存放测试数据
drwxrwxr-x 2 hadoop hadoop  4096 Jul  5 10:40 conf       # 存放配置文件
drwxr-xr-x 2 hadoop hadoop  4096 Jul  5 10:40 sbin       # 存放服务端相关的脚本:启停集群等
-rw-rw-r-- 1 hadoop hadoop   135 Jul  5 10:40 RELEASE
drwxr-xr-x 6 hadoop hadoop  4096 Jul  5 10:40 python
drwxrwxr-x 2 hadoop hadoop 16384 Jul  5 10:40 jars       # 存放Spark相应的jar包
drwxr-xr-x 2 hadoop hadoop  4096 Jul  5 10:40 bin        # 存放客户端相关的脚本(.cmd是windows使用)
[hadoop@hadoop01 spark-2.2.0-bin-2.6.0-cdh5.7.0]$