ddps-lab / distributed-matrix-completion

0 stars 0 forks source link

marlin 을 현재 spark 버전으로 import 하기 #7

Closed kmu-leeky closed 7 years ago

kmu-leeky commented 7 years ago

https://github.com/PasaLab/marlin

KimJeongChul commented 7 years ago

bd-2 container-id : matrix spark version 2.0.2 hadoop 2.6.0 https://github.com/PasaLab/marlin/tree/matrix-analysis-spark2.0

Build Custom Spark (APIs of Spark 2.0.2 version)

http://spark.apache.org/docs/2.0.2/building-spark.html#buildmvn

$ git clone https://github.com/PasaLab/marlin
$ cd marlin
$ git checkout matrix-analysis-spark2.0
$ cd spark-2.0.2-src/build/
$ chmod 755 mvn & cd ..
# Apache Hadoop 2.7.X and later
$ ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests clean package
KimJeongChul commented 7 years ago

Build Error - 1

image [error] /root/marlin/spark-2.0.2-src/graphx/src/main/scala/org/apache/spark/graphx/GraphOps.scala:24: object lib is not a member of package org.apache.spark.graphx 기존의 spark-shell에서는 import org.apache.spark.graphx.lib._ 이상없음.

Try

$ git clone https://github.com/apache/spark
$ cd spark
$ git checkout branch-2.0
$ cd graphx/src/main/scala/org/apache/spark/graphx
$ cp -r lib ~/marlin/spark-2.0.2-src/graphx/src/main/scala/org/apache/spark/graphx

Build Error 2

spark-catalyst : Could not resolve dependencies for project org.apache.spark:spark-catalyst_2.11 image

Try

$ ./dev/change-scala-version.sh 2.11
$ ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests -Dscala-2.11 clean package

Build Error3

[ERROR] Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile (scala-test-compile-first) on project spark-sql_2.11: Execution scala-test-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:testCompile failed. CompileFailed -> [Help 1]

[ERROR] /root/marlin/spark-2.0.2-src/sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala:352: object creation impossible, since: it has 2 unimplemented members.

image

Try

http://spark.apache.org/docs/2.0.2/building-spark.html#speeding-up-compilation-with-zinc

$ ./build/zinc-0.3.9/bin/zinc -shutdown # Build Failed <- (X)

Try 2

$ ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests -Dscala-2.11 -DrecompileMode=all clean package

Try 3

$ vi pom.xml

$ ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests -Dscala-2.11 -DrecompileMode=all -X -rf :spark-sql_2.11 clean package

KimJeongChul commented 7 years ago

Build Success

./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.6.0 -DskipTests -Dscala-2.11 -DrecompileMode=all -X -rf :spark-sql_2.11 clean package image

KimJeongChul commented 7 years ago

$ chmod 755 bin/* $ ./bin/spark-shell https://github.com/kmu-bigdata/distributed-matrixcompletion/blob/master/spark_square_matrix_matmul.scala 돌려본 결과 image

kmu-leeky commented 7 years ago

example 폴더의 MatrixMultiply.scala 를 실행해보고 close 하자

kmu-leeky commented 7 years ago

Marlin 의 성능 보다는 MatFast 와의 비교가 필요해 보임 - http://ieeexplore.ieee.org/document/7930046/ . 클로즈