Open cindygl opened 6 years ago
环境: 操作系统:CentOS7.3 Maven:maven-3.5.4 JDK:jdk-1.8.0_45 Scala:2.11.12 备注:本文编译和安装spark全过程使用hadoop用户,除标注了切换root用户的地方以外,其他均是hadoop用户操作。
进入spark官网,选择最新版本的spark,包类型选择source code,右键复制链接地址后去服务器上下载
[hadoop@hadoop01 source]$ wget https://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0.tgz
官方文档:Spark官方文档 - More - Building Spark
官方文档要求,编译spark之前环境要求: Maven版本:Maven-3.3.9 or newer Java版本:Java 8+ Scala:2.10+
# 下载 [hadoop@hadoop01 software]$ wget http://mirror.stjschools.org/public/apache/maven/maven-3/3.5.4/binaries/apache-maven-3.5.4-bin.tar.gz # 解压 [hadoop@hadoop01 software]$ tar -zxvf apache-maven-3.5.4-bin.tar.gz -C ~/app/ # 配置环境变量(切换root用户) [root@hadoop01 ~]# vi /etc/profile export MAVEN_HOME=/home/hadoop/app/apache-maven-3.5.4 export PATH=$MAVEN_HOME/bin:$PATH [root@hadoop01 ~]# source /etc/profile # 验证安装成功 [root@hadoop01 ~]# mvn --version Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-18T02:33:14+08:00) Maven home: /home/hadoop/app/apache-maven-3.5.4 Java version: 1.8.0_45, vendor: Oracle Corporation, runtime: /usr/java/jdk1.8.0_45/jre Default locale: en_US, platform encoding: UTF-8 OS name: "linux", version: "3.10.0-514.el7.x86_64", arch: "amd64", family: "unix" [root@hadoop01 ~]# # 修改maven本地仓库路径(存放maven依赖) [hadoop@hadoop01 ~]$ vim app/apache-maven-3.5.4/conf/settings.xml <localRepository>/home/hadoop/maven_repo</localRepository>
# 卸载操作系统自带的jdk # 安装jdk8 [root@hadoop01 ~]# mkdir /usr/java [root@hadoop01 ~]# cp /opt/software/jdk-8u45-linux-x64.gz /usr/java/ [root@hadoop01 ~]# cd /usr/java/ [root@hadoop01 java]# tar -zxvf jdk-8u45-linux-x64.gz [root@hadoop01 java]# chown -R root:root jdk1.8.0_45/ # 配置环境变量(切换root用户) [root@hadoop01 java]# vim /etc/profile export JAVA_HOME=/usr/java/jdk1.8.0_45/ export PATH=$PATH:$JAVA_HOME/bin [root@hadoop01 java]# source /etc/profile # 验证安装成功 [hadoop@hadoop01 software]$ java -version java version "1.8.0_45" Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) [hadoop@hadoop01 software]$ echo $JAVA_HOME /usr/java/jdk1.8.0_45 [hadoop@hadoop01 software]$
# 下载 [hadoop@hadoop01 software]$ wget https://downloads.lightbend.com/scala/2.11.12/scala-2.11.12.tgz # 解压 [hadoop@hadoop01 software]$ tar -zxvf scala-2.11.12.tgz -C ~/app/ # 配置环境变量(切换root用户) [root@hadoop01 software]# vi /etc/profile export SCALA_HOME=/home/hadoop/app/scala-2.11.12 export PATH=$SCALA_HOME/bin:$PATH [root@hadoop01 software]# source /etc/profile # 验证安装成功 [hadoop@hadoop01 ~]# scala Welcome to Scala 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45). Type in expressions for evaluation. Or try :help. scala> :q [hadoop@hadoop01 ~]#
[root@hadoop01 ~]# yum -y install git
# 源码编译的方式(官方文档) 方式一:maven编译,这种方式编译后不会产生一共tgz包,适合开发人员使用(这种编译方式要先配置MAVEN_OPTS) [hadoop@hadoop01 source]$ tar -zxvf spark-2.2.0.tgz [hadoop@hadoop01 source]$ cd spark-2.2.0/ [hadoop@hadoop01 source]$ export MAVEN_OPTS= # 也可以写在/etc/profile中,一劳永逸 [hadoop@hadoop01 source]$ mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-2.6 -Dhadoop.version=2.6.0-cdh5.7.0 -DskipTests clean package 方式二:make-distribution编译,这种方式会产生一个tgz包,拷贝到产环境安装。spark根目录下的make-distribution.sh已经封装好了方法一中maven的命令,以及MAVEN_OPTS的配置,所以编译时只需要把hadoop和yarn相关参数传进去即可。本次我们选择使用这种方式来编译spark。 # 步骤1:修改/dev/make-distribution.sh,将maven检查的步骤全部注释掉,并且增加变量配置各软件的版本号(修改以上配置是为了让spark编译过程跳过一些版本检查,加快编译速度) [hadoop@hadoop01 source]$ tar -zxvf spark-2.2.0.tgz [hadoop@hadoop01 source]$ cd spark-2.2.0/ [hadoop@hadoop01 spark-2.2.0]$ vi dev/make-distribution.sh 注释掉下面的内容 # VERSION=$("$MVN" help:evaluate -Dexpression=project.version $@ 2>/dev/null | grep -v "INFO" | tail -n 1) # SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\ # | grep -v "INFO"\ # | tail -n 1) # SPARK_HADOOP_VERSION=$("$MVN" help:evaluate -Dexpression=hadoop.version $@ 2>/dev/null\ # | grep -v "INFO"\ # | tail -n 1) # SPARK_HIVE=$("$MVN" help:evaluate -Dexpression=project.activeProfiles -pl sql/hive $@ 2>/dev/null\ # | grep -v "INFO"\ # | fgrep --count "<id>hive</id>";\ # # Reset exit status to 0, otherwise the script stops here if the last grep finds nothing\ # # because we use "set -o pipefail" # echo -n) 并且在注释内容的下方添加下面的版本号 VERSION=2.2.0 SCALA_VERSION=2.11 SPARK_HADOOP_VERSION=2.6.0-cdh5.7.0 SPARK_HIVE=1 # 表示支持hive # 步骤2:修改pom.xml文件,添加clouderad repos(默认使用apache官方的仓库) [hadoop@hadoop01 spark-2.2.0]$ vi pom.xml 在<repositories>标签下添加 <repository> <id>cloudera</id> <name>Cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository> # 步骤3:编译spark [hadoop@hadoop01 spark-2.2.0]$ ./dev/make-distribution.sh \ > --name 2.6.0-cdh5.7.0 \ > --tgz \ > -Dhadoop.version=2.6.0-cdh5.7.0 \ > -Dhadoop2.6 \ > -Phive -Phive-thriftserver \ > -Pyarn ... [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM 2.2.0 ..................... SUCCESS [04:06 min] [INFO] Spark Project Tags ................................. SUCCESS [ 25.359 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 4.362 s] [INFO] Spark Project Networking ........................... SUCCESS [ 33.652 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 5.041 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 29.857 s] [INFO] Spark Project Launcher ............................. SUCCESS [01:18 min] [INFO] Spark Project Core ................................. SUCCESS [02:15 min] [INFO] Spark Project ML Local Library ..................... SUCCESS [ 30.105 s] [INFO] Spark Project GraphX ............................... SUCCESS [ 15.674 s] [INFO] Spark Project Streaming ............................ SUCCESS [ 30.125 s] [INFO] Spark Project Catalyst ............................. SUCCESS [01:17 min] [INFO] Spark Project SQL .................................. SUCCESS [01:49 min] [INFO] Spark Project ML Library ........................... SUCCESS [01:10 min] [INFO] Spark Project Tools ................................ SUCCESS [ 24.995 s] [INFO] Spark Project Hive ................................. SUCCESS [ 39.965 s] [INFO] Spark Project REPL ................................. SUCCESS [ 4.626 s] [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 41.294 s] [INFO] Spark Project YARN ................................. SUCCESS [ 47.090 s] [INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 18.700 s] [INFO] Spark Project Assembly ............................. SUCCESS [ 2.468 s] [INFO] Spark Project External Flume Sink .................. SUCCESS [ 33.714 s] [INFO] Spark Project External Flume ....................... SUCCESS [ 10.476 s] [INFO] Spark Project External Flume Assembly .............. SUCCESS [ 2.342 s] [INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 17.229 s] [INFO] Kafka 0.10 Source for Structured Streaming ......... SUCCESS [ 10.172 s] [INFO] Spark Project Examples ............................. SUCCESS [ 15.039 s] [INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 3.564 s] [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 16.415 s] [INFO] Spark Integration for Kafka 0.10 Assembly 2.2.0 .... SUCCESS [ 4.040 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 12:40 min (Wall Clock) [INFO] Finished at: 2018-07-05T10:40:09+08:00 [INFO] ------------------------------------------------------------------------ + rm -rf /home/hadoop/sourcecode/spark-2.2.0/dist + mkdir -p /home/hadoop/sourcecode/spark-2.2.0/dist/jars + echo 'Spark 2.2.0 built for Hadoop 2.6.0-cdh5.7.0' + echo 'Build flags: -Dhadoop.version=2.6.0-cdh5.7.0' -Dhadoop2.6 -Phive -Phive-thriftserver -Pyarn + cp /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/activation-1.1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/antlr-2.7.7.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/antlr4-runtime-4.5.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/antlr-runtime-3.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aopalliance-1.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aopalliance-repackaged-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/apacheds-i18n-2.0.0-M15.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/apacheds-kerberos-codec-2.0.0-M15.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/apache-log4j-extras-1.2.17.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/api-asn1-api-1.0.0-M20.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/api-util-1.0.0-M20.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/arpack_combined_all-0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/avro-1.7.7.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/avro-ipc-1.7.7.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/avro-mapred-1.7.7-hadoop2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aws-java-sdk-core-1.10.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aws-java-sdk-kms-1.10.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/aws-java-sdk-s3-1.10.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/base64-2.3.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/bcprov-jdk15on-1.51.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/bonecp-0.8.0.RELEASE.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/breeze_2.11-0.13.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/breeze-macros_2.11-0.13.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/calcite-avatica-1.2.0-incubating.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/calcite-core-1.2.0-incubating.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/calcite-linq4j-1.2.0-incubating.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/chill_2.11-0.8.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/chill-java-0.8.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-beanutils-1.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-beanutils-core-1.8.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-cli-1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-codec-1.10.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-collections-3.2.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-compiler-3.0.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-compress-1.4.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-configuration-1.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-crypto-1.0.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-dbcp-1.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-digester-1.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-httpclient-3.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-io-2.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-lang-2.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-lang3-3.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-logging-1.1.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-math3-3.4.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-net-2.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/commons-pool-1.5.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/compress-lzf-1.0.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/core-1.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/curator-client-2.6.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/curator-framework-2.6.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/curator-recipes-2.6.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/datanucleus-api-jdo-3.2.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/datanucleus-core-3.2.10.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/datanucleus-rdbms-3.2.9.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/derby-10.12.1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/eigenbase-properties-1.1.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/gson-2.2.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/guava-14.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/guice-3.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/guice-servlet-3.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-annotations-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-auth-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-aws-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-client-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-common-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-hdfs-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-app-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-common-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-core-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-jobclient-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-mapreduce-client-shuffle-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-api-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-client-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-common-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-server-common-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hadoop-yarn-server-web-proxy-2.6.0-cdh5.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-beeline-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-cli-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-exec-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-jdbc-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hive-metastore-1.2.1.spark2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hk2-api-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hk2-locator-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/hk2-utils-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/htrace-core4-4.0.1-incubating.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/httpclient-4.5.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/httpcore-4.4.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/ivy-2.4.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-annotations-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-core-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-core-asl-1.9.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-databind-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-jaxrs-1.9.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-mapper-asl-1.9.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-module-paranamer-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-module-scala_2.11-2.6.5.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jackson-xc-1.9.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/janino-3.0.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/JavaEWAH-0.3.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javassist-3.18.1-GA.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.annotation-api-1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.inject-1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.inject-2.4.0-b34.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/java-xmlbuilder-1.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.servlet-api-3.1.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javax.ws.rs-api-2.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/javolution-5.5.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jaxb-api-2.2.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jcl-over-slf4j-1.7.16.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jdo-api-3.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-client-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-common-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-container-servlet-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-container-servlet-core-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-guava-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-media-jaxb-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jersey-server-2.22.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jets3t-0.9.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jetty-6.1.26.cloudera.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jetty-util-6.1.26.cloudera.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jline-2.12.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/joda-time-2.9.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jodd-core-3.5.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jpam-1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/json4s-ast_2.11-3.2.11.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/json4s-core_2.11-3.2.11.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/json4s-jackson_2.11-3.2.11.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jsr305-1.3.9.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jta-1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jtransforms-2.4.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/jul-to-slf4j-1.7.16.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/kryo-shaded-3.0.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/leveldbjni-all-1.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/libfb303-0.9.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/libthrift-0.9.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/log4j-1.2.17.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/lz4-1.3.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/machinist_2.11-0.6.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/macro-compat_2.11-1.1.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/mail-1.4.7.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/metrics-core-3.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/metrics-graphite-3.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/metrics-json-3.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/metrics-jvm-3.1.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/minlog-1.3.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/mx4j-3.0.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/netty-3.9.9.Final.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/netty-all-4.0.43.Final.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/objenesis-2.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/opencsv-2.3.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/oro-2.0.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/osgi-resource-locator-1.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/paranamer-2.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-column-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-common-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-encoding-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-format-2.3.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-hadoop-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-hadoop-bundle-1.6.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/parquet-jackson-1.8.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/pmml-model-1.2.15.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/pmml-schema-1.2.15.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/protobuf-java-2.5.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/py4j-0.10.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/pyrolite-4.13.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/RoaringBitmap-0.5.11.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-compiler-2.11.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-library-2.11.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scalap-2.11.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-parser-combinators_2.11-1.0.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-reflect-2.11.8.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/scala-xml_2.11-1.0.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/shapeless_2.11-2.3.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/slf4j-api-1.7.16.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/slf4j-log4j12-1.7.16.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/snappy-0.2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/snappy-java-1.1.2.6.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-catalyst_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-core_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-graphx_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-hive_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-hive-thriftserver_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-launcher_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-mllib_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-mllib-local_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-network-common_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-network-shuffle_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-repl_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-sketch_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-sql_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-streaming_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-tags_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-unsafe_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spark-yarn_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spire_2.11-0.13.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/spire-macros_2.11-0.13.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/ST4-4.0.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/stax-api-1.0.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/stax-api-1.0-2.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/stream-2.7.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/stringtemplate-3.2.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/super-csv-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/univocity-parsers-2.2.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/validation-api-1.1.0.Final.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/xbean-asm5-shaded-4.4.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/xercesImpl-2.9.1.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/xmlenc-0.52.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/xz-1.0.jar /home/hadoop/sourcecode/spark-2.2.0/assembly/target/scala-2.11/jars/zookeeper-3.4.6.jar /home/hadoop/sourcecode/spark-2.2.0/dist/jars/ + '[' -f /home/hadoop/sourcecode/spark-2.2.0/common/network-yarn/target/scala-2.11/spark-2.2.0-yarn-shuffle.jar ']' + mkdir /home/hadoop/sourcecode/spark-2.2.0/dist/yarn + cp /home/hadoop/sourcecode/spark-2.2.0/common/network-yarn/target/scala-2.11/spark-2.2.0-yarn-shuffle.jar /home/hadoop/sourcecode/spark-2.2.0/dist/yarn + mkdir -p /home/hadoop/sourcecode/spark-2.2.0/dist/examples/jars + cp /home/hadoop/sourcecode/spark-2.2.0/examples/target/scala-2.11/jars/scopt_2.11-3.3.0.jar /home/hadoop/sourcecode/spark-2.2.0/examples/target/scala-2.11/jars/spark-examples_2.11-2.2.0.jar /home/hadoop/sourcecode/spark-2.2.0/dist/examples/jars + for f in '"$DISTDIR"/examples/jars/*' ++ basename /home/hadoop/sourcecode/spark-2.2.0/dist/examples/jars/scopt_2.11-3.3.0.jar + name=scopt_2.11-3.3.0.jar + '[' -f /home/hadoop/sourcecode/spark-2.2.0/dist/jars/scopt_2.11-3.3.0.jar ']' + for f in '"$DISTDIR"/examples/jars/*' ++ basename /home/hadoop/sourcecode/spark-2.2.0/dist/examples/jars/spark-examples_2.11-2.2.0.jar + name=spark-examples_2.11-2.2.0.jar + '[' -f /home/hadoop/sourcecode/spark-2.2.0/dist/jars/spark-examples_2.11-2.2.0.jar ']' + mkdir -p /home/hadoop/sourcecode/spark-2.2.0/dist/examples/src/main + cp -r /home/hadoop/sourcecode/spark-2.2.0/examples/src/main /home/hadoop/sourcecode/spark-2.2.0/dist/examples/src/ + cp /home/hadoop/sourcecode/spark-2.2.0/LICENSE /home/hadoop/sourcecode/spark-2.2.0/dist + cp -r /home/hadoop/sourcecode/spark-2.2.0/licenses /home/hadoop/sourcecode/spark-2.2.0/dist + cp /home/hadoop/sourcecode/spark-2.2.0/NOTICE /home/hadoop/sourcecode/spark-2.2.0/dist + '[' -e /home/hadoop/sourcecode/spark-2.2.0/CHANGES.txt ']' + cp -r /home/hadoop/sourcecode/spark-2.2.0/data /home/hadoop/sourcecode/spark-2.2.0/dist + '[' false == true ']' + echo 'Skipping building python distribution package' Skipping building python distribution package + '[' false == true ']' + echo 'Skipping building R source package' Skipping building R source package + mkdir /home/hadoop/sourcecode/spark-2.2.0/dist/conf + cp /home/hadoop/sourcecode/spark-2.2.0/conf/docker.properties.template /home/hadoop/sourcecode/spark-2.2.0/conf/fairscheduler.xml.template /home/hadoop/sourcecode/spark-2.2.0/conf/log4j.properties.template /home/hadoop/sourcecode/spark-2.2.0/conf/metrics.properties.template /home/hadoop/sourcecode/spark-2.2.0/conf/slaves.template /home/hadoop/sourcecode/spark-2.2.0/conf/spark-defaults.conf.template /home/hadoop/sourcecode/spark-2.2.0/conf/spark-env.sh.template /home/hadoop/sourcecode/spark-2.2.0/dist/conf + cp /home/hadoop/sourcecode/spark-2.2.0/README.md /home/hadoop/sourcecode/spark-2.2.0/dist + cp -r /home/hadoop/sourcecode/spark-2.2.0/bin /home/hadoop/sourcecode/spark-2.2.0/dist + cp -r /home/hadoop/sourcecode/spark-2.2.0/python /home/hadoop/sourcecode/spark-2.2.0/dist + '[' false == true ']' + cp -r /home/hadoop/sourcecode/spark-2.2.0/sbin /home/hadoop/sourcecode/spark-2.2.0/dist + '[' -d /home/hadoop/sourcecode/spark-2.2.0/R/lib/SparkR ']' + '[' true == true ']' + TARDIR_NAME=spark-2.2.0-bin-2.6.0-cdh5.7.0 + TARDIR=/home/hadoop/sourcecode/spark-2.2.0/spark-2.2.0-bin-2.6.0-cdh5.7.0 + rm -rf /home/hadoop/sourcecode/spark-2.2.0/spark-2.2.0-bin-2.6.0-cdh5.7.0 + cp -r /home/hadoop/sourcecode/spark-2.2.0/dist /home/hadoop/sourcecode/spark-2.2.0/spark-2.2.0-bin-2.6.0-cdh5.7.0 + tar czf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz -C /home/hadoop/sourcecode/spark-2.2.0 spark-2.2.0-bin-2.6.0-cdh5.7.0 + rm -rf /home/hadoop/sourcecode/spark-2.2.0/spark-2.2.0-bin-2.6.0-cdh5.7.0 [hadoop@hadoop01 spark-2.2.0]$ # 编译完成在spark根目录可以看到一个tgz包 [hadoop@hadoop01 spark-2.2.0]$ ls -ltr spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz -rw-rw-r-- 1 hadoop hadoop 199149808 Jul 5 10:40 spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz [hadoop@hadoop01 spark-2.2.0]$
一般我们会在本地环境把spark编译完成,之后拷贝到生产服务器去解压安装。
# 拷贝tgz包到生产服务器并解压 [hadoop@hadoop01 software]$ tar -zxvf spark-2.2.0-bin-2.6.0-cdh5.7.0.tgz -C ~/app/ # 配置环境变量(切换root用户) [root@hadoop01 ~]# vi /etc/profile export SPARK_HOME=/home/hadoop/app/spark-2.2.0-bin-2.6.0-cdh5.7.0 export PATH=$SPARK_HOME/bin:$PATH [root@hadoop01 ~]# source /etc/profile # 验证安装成功 [hadoop@hadoop01 ~]$ spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 18/07/10 02:25:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/07/10 02:25:48 ERROR ObjectStore: Version information found in metastore differs 1.1.0 from expected schema version 1.2.0. Schema verififcation is disabled hive.metastore.schema.verification so setting version. 18/07/10 02:25:49 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException Spark context Web UI available at http://10.132.37.38:4040 Spark context available as 'sc' (master = local[*], app id = local-1531160743401). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45) Type in expressions to have them evaluated. Type :help for more information. scala> # Spark目录说明 [hadoop@hadoop01 software]$ cd ~/app/spark-2.2.0-bin-2.6.0-cdh5.7.0/ [hadoop@hadoop01 spark-2.2.0-bin-2.6.0-cdh5.7.0]$ ls -ltr total 104 drwxrwxr-x 2 hadoop hadoop 4096 Jul 5 10:40 yarn # 存放yarn相关的jar -rw-r--r-- 1 hadoop hadoop 3809 Jul 5 10:40 README.md -rw-r--r-- 1 hadoop hadoop 24645 Jul 5 10:40 NOTICE drwxr-xr-x 2 hadoop hadoop 4096 Jul 5 10:40 licenses -rw-r--r-- 1 hadoop hadoop 17881 Jul 5 10:40 LICENSE drwxrwxr-x 4 hadoop hadoop 4096 Jul 5 10:40 examples # 存放Spark自带的测试用例 *****重点看 drwxr-xr-x 5 hadoop hadoop 4096 Jul 5 10:40 data # 存放测试数据 drwxrwxr-x 2 hadoop hadoop 4096 Jul 5 10:40 conf # 存放配置文件 drwxr-xr-x 2 hadoop hadoop 4096 Jul 5 10:40 sbin # 存放服务端相关的脚本:启停集群等 -rw-rw-r-- 1 hadoop hadoop 135 Jul 5 10:40 RELEASE drwxr-xr-x 6 hadoop hadoop 4096 Jul 5 10:40 python drwxrwxr-x 2 hadoop hadoop 16384 Jul 5 10:40 jars # 存放Spark相应的jar包 drwxr-xr-x 2 hadoop hadoop 4096 Jul 5 10:40 bin # 存放客户端相关的脚本(.cmd是windows使用) [hadoop@hadoop01 spark-2.2.0-bin-2.6.0-cdh5.7.0]$
环境: 操作系统:CentOS7.3 Maven:maven-3.5.4 JDK:jdk-1.8.0_45 Scala:2.11.12 备注:本文编译和安装spark全过程使用hadoop用户,除标注了切换root用户的地方以外,其他均是hadoop用户操作。
1. 下载Spark源码
进入spark官网,选择最新版本的spark,包类型选择source code,右键复制链接地址后去服务器上下载
2. 编译spark源码
官方文档:Spark官方文档 - More - Building Spark
官方文档要求,编译spark之前环境要求: Maven版本:Maven-3.3.9 or newer Java版本:Java 8+ Scala:2.10+
2.1 安装maven
2.2 安装jdk8+
2.3 安装Scala
2.4 安装git(make-distribution编译时可能需要用到)(切换root用户)
2.5 编译Spark
3. 安装Spark
一般我们会在本地环境把spark编译完成,之后拷贝到生产服务器去解压安装。