BIDData / BIDMat

A CPU and GPU-accelerated matrix library for data mining
BSD 3-Clause "New" or "Revised" License
265 stars 73 forks source link

How to make BIDMat as the GPU Backend for Spark? #22

Open databig opened 9 years ago

databig commented 9 years ago

@jcanny I wanna leverage the GPU Resources in Spark. For example use GPU to do some Matrix Computation. I am thinking about how to configure to ake BIDMat as the GPU Backend for Spark? Likely, I use Maven, How to add sth in POM.XML (Attached)

4.0.0 org.apache apache 14 org.apache.spark spark-parent 1.1.0 pom Spark Project Parent POM http://spark.apache.org/ Apache 2.0 License http://www.apache.org/licenses/LICENSE-2.0.html repo scm:git:git@github.com:apache/spark.git scm:git:https://git-wip-us.apache.org/repos/asf/spark.git scm:git:git@github.com:apache/spark.git v1.1.0-rc4 matei Matei Zaharia matei.zaharia@gmail.com http://www.cs.berkeley.edu/~matei Apache Software Foundation http://spark.apache.org JIRA https://issues.apache.org/jira/browse/SPARK ``` 3.0.4 Dev Mailing List dev@spark.apache.org dev-subscribe@spark.apache.org dev-unsubscribe@spark.apache.org User Mailing List user@spark.apache.org user-subscribe@spark.apache.org user-unsubscribe@spark.apache.org Commits Mailing List commits@spark.apache.org commits-subscribe@spark.apache.org commits-unsubscribe@spark.apache.org core bagel graphx mllib tools streaming sql/catalyst sql/core sql/hive repl assembly external/twitter external/kafka external/flume external/flume-sink external/zeromq external/mqtt examples UTF-8 UTF-8 1.6 spark 2.10.4 2.10 2.0.1 0.18.1 shaded-protobuf org.spark-project.akka 2.2.3-shaded-protobuf 1.7.5 1.2.17 1.0.4 2.4.1 ${hadoop.version} 0.94.6 1.4.0 3.4.5 0.12.0 1.4.3 1.2.3 8.1.14.v20131031 0.3.6 3.0.0 1.7.6 0.7.1 1.8.3 1.1.0 64m 512m central Maven Repository https://repo1.maven.org/maven2 true false apache-repo Apache Repository https://repository.apache.org/content/repositories/releases true false jboss-repo JBoss Repository https://repository.jboss.org/nexus/content/repositories/releases true false mqtt-repo MQTT Repository https://repo.eclipse.org/content/repositories/paho-releases true false cloudera-repo Cloudera Repository https://repository.cloudera.com/artifactory/cloudera-repos true false mapr-repo MapR Repository http://repository.mapr.com/maven true false spring-releases Spring Release Repository https://repo.spring.io/libs-release true false central https://repo1.maven.org/maven2 true false org.eclipse.jetty jetty-util ${jetty.version} org.eclipse.jetty jetty-security ${jetty.version} org.eclipse.jetty jetty-plus ${jetty.version} org.eclipse.jetty jetty-server ${jetty.version} com.google.guava guava 14.0.1 org.apache.commons commons-lang3 3.3.2 commons-codec commons-codec 1.5 org.apache.commons commons-math3 3.3 test com.google.code.findbugs jsr305 1.3.9 org.slf4j slf4j-api ${slf4j.version} org.slf4j slf4j-log4j12 ${slf4j.version} org.slf4j jul-to-slf4j ${slf4j.version} org.slf4j jcl-over-slf4j ${slf4j.version} log4j log4j ${log4j.version} com.ning compress-lzf 1.0.0 org.xerial.snappy snappy-java 1.0.5.3 net.jpountz.lz4 lz4 1.2.0 com.clearspring.analytics stream 2.7.0 it.unimi.dsi fastutil com.google.protobuf protobuf-java ${protobuf.version} com.twitter chill_${scala.binary.version} ${chill.version} org.ow2.asm asm org.ow2.asm asm-commons com.twitter chill-java ${chill.version} org.ow2.asm asm org.ow2.asm asm-commons ${akka.group} akka-actor_${scala.binary.version} ${akka.version} ${akka.group} akka-remote_${scala.binary.version} ${akka.version} ${akka.group} akka-slf4j_${scala.binary.version} ${akka.version} ${akka.group} akka-testkit_${scala.binary.version} ${akka.version} colt colt 1.2.0 org.apache.mesos mesos ${mesos.version} ${mesos.classifier} com.google.protobuf protobuf-java commons-net commons-net 2.2 io.netty netty-all 4.0.23.Final org.apache.derby derby 10.4.2.0 com.codahale.metrics metrics-core ${codahale.metrics.version} com.codahale.metrics metrics-jvm ${codahale.metrics.version} com.codahale.metrics metrics-json ${codahale.metrics.version} com.codahale.metrics metrics-ganglia ${codahale.metrics.version} com.codahale.metrics metrics-graphite ${codahale.metrics.version} org.scala-lang scala-compiler ${scala.version} org.scala-lang scala-reflect ${scala.version} org.scala-lang jline ${scala.version} org.scala-lang scala-library ${scala.version} org.scala-lang scala-actors ${scala.version} org.scala-lang scalap ${scala.version} org.scalatest scalatest_${scala.binary.version} 2.1.5 test org.easymock easymockclassextension 3.1 test asm asm 3.3.1 test org.mockito mockito-all 1.9.0 test org.scalacheck scalacheck_${scala.binary.version} 1.11.3 test junit junit 4.10 test com.novocode junit-interface 0.10 test org.apache.curator curator-recipes 2.4.0 org.jboss.netty netty org.apache.hadoop hadoop-client ${hadoop.version} asm asm org.ow2.asm asm org.jboss.netty netty commons-logging commons-logging org.mortbay.jetty servlet-api-2.5 javax.servlet servlet-api junit junit org.apache.avro avro ${avro.version} org.apache.avro avro-ipc ${avro.version} io.netty netty org.mortbay.jetty jetty org.mortbay.jetty jetty-util org.mortbay.jetty servlet-api org.apache.velocity velocity org.apache.avro avro-mapred ${avro.version} io.netty netty org.mortbay.jetty jetty org.mortbay.jetty jetty-util org.mortbay.jetty servlet-api org.apache.velocity velocity net.java.dev.jets3t jets3t ${jets3t.version} commons-logging commons-logging org.apache.hadoop hadoop-yarn-api ${yarn.version} javax.servlet servlet-api asm asm org.ow2.asm asm org.jboss.netty netty commons-logging commons-logging org.apache.hadoop hadoop-yarn-common ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.apache.hadoop hadoop-yarn-server-web-proxy ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.apache.hadoop hadoop-yarn-client ${yarn.version} asm asm org.ow2.asm asm org.jboss.netty netty javax.servlet servlet-api commons-logging commons-logging org.codehaus.jackson jackson-mapper-asl 1.8.8 org.apache.maven.plugins maven-enforcer-plugin 1.3.1 enforce-versions enforce 3.0.4 ${java.version} org.codehaus.mojo build-helper-maven-plugin 1.8 net.alchim31.maven scala-maven-plugin 3.2.0 scala-compile-first process-resources compile scala-test-compile-first process-test-resources testCompile attach-scaladocs verify doc-jar ${scala.version} incremental true -unchecked -deprecation -feature -language:postfixOps -Xms1024m -Xmx1024m -XX:PermSize=${PermGen} -XX:MaxPermSize=${MaxPermGen} -source ${java.version} -target ${java.version} org.scalamacros paradise_${scala.version} ${scala.macros.version} org.apache.maven.plugins maven-compiler-plugin 3.1 ${java.version} ${java.version} UTF-8 1024m true org.apache.maven.plugins maven-surefire-plugin 2.17 true org.scalatest scalatest-maven-plugin 1.0-RC2 ${project.build.directory}/surefire-reports . ${project.build.directory}/SparkTestSuite.txt -Xmx3g -XX:MaxPermSize=${MaxPermGen} -XX:ReservedCodeCacheSize=512m true ${session.executionRootDirectory} 1 test test org.apache.maven.plugins maven-jar-plugin 2.4 org.apache.maven.plugins maven-antrun-plugin 1.7 org.apache.maven.plugins maven-shade-plugin 2.2 org.apache.maven.plugins maven-source-plugin 2.2.1 true create-source-jar jar-no-fork org.apache.maven.plugins maven-clean-plugin 2.5 work checkpoint org.apache.maven.plugins maven-enforcer-plugin org.codehaus.mojo build-helper-maven-plugin add-scala-sources generate-sources add-source src/main/scala add-scala-test-sources generate-test-sources add-test-source src/test/scala net.alchim31.maven scala-maven-plugin org.apache.maven.plugins maven-source-plugin org.scalastyle scalastyle-maven-plugin 0.4.0 false true false false ${basedir}/src/main/scala ${basedir}/src/test/scala scalastyle-config.xml scalastyle-output.xml UTF-8 package check spark-ganglia-lgpl extras/spark-ganglia-lgpl kinesis-asl extras/kinesis-asl java8-tests org.apache.maven.plugins maven-jar-plugin test-jar extras/java8-tests hadoop-0.23 org.apache.avro avro 0.23.10 hadoop-2.2 2.2.0 2.5.0 hadoop-2.3 2.3.0 2.5.0 0.9.0 hadoop-2.4 2.4.0 2.5.0 0.9.0 yarn-alpha yarn yarn yarn mapr3 false 1.0.3-mapr-3.0.3 2.3.0-mapr-4.0.0-FCS 0.94.17-mapr-1405 3.4.5-mapr-1406 mapr4 false 2.3.0-mapr-4.0.0-FCS 2.3.0-mapr-4.0.0-FCS 0.94.17-mapr-1405-4.0.0-FCS 3.4.5-mapr-1406 org.apache.curator curator-recipes 2.4.0 org.apache.zookeeper zookeeper org.apache.zookeeper zookeeper 3.4.5-mapr-1406 hadoop-provided false org.apache.hadoop hadoop-client provided org.apache.hadoop hadoop-yarn-api provided org.apache.hadoop hadoop-yarn-common provided org.apache.hadoop hadoop-yarn-server-web-proxy provided org.apache.hadoop hadoop-yarn-client provided org.apache.avro avro provided org.apache.avro avro-ipc provided org.apache.zookeeper zookeeper ${zookeeper.version} provided hive false sql/hive-thriftserver ```
jcanny commented 9 years ago

The challenge is to include all the jars with native libs that BIDMat uses. These are currently: JCUDA LZ4 HDF5-Java BIDMat's own native libs.

All of the native libs except HDF5 can be bundled into a Jar, and will automatically be unpacked when the calling class is loaded. The JVM will need access to a tmp directory to do the unpacking.

HD5 doesnt support this (or didnt as of the version we use). I dont remember if it can run at all without the native libs. I dont think so. It may not be too relevant if running on Spark where I/O is managed by Spark. Does Spark have LZ4 support already?

It might be better to make a single assembly jar with all of the dependent jars and native libs are included. Otherwise someone would have to commit to maintaining repo copies of the dependencies.

databig commented 9 years ago

@jcanny I will try, Please make sure Am I right or not? 1, Claim the Version Number

1.0.0 6.5 2, Add into Dependencies bidmat? ${bidmat.version} jcuda? jcuda? ${/jcuda.version} HDF5-Java? HDF5-Java? ??? LZ4? LZ4? ??? 【?】 means I am not sure, could you please kindly confirm it? ![screenshot 2015-03-18 17 26 57](https://cloud.githubusercontent.com/assets/5568415/6719844/12d4dc30-cd94-11e4-95bf-8a68f35fcdf0.png) ![screenshot 2015-03-18 17 27 22](https://cloud.githubusercontent.com/assets/5568415/6719845/12d8ae78-cd94-11e4-90f1-e4089da5e50e.png) ![screenshot 2015-03-18 17 27 29](https://cloud.githubusercontent.com/assets/5568415/6719846/12da1056-cd94-11e4-9adb-23f91415446f.png) Tks.
jcanny commented 9 years ago

We should talk on the phone so I"m clear on what you're trying to do (build with maven?). Most of the libs are not in repos, so we'd have to figure something else out.

-John On 3/18/2015 2:24 PM, databig wrote:

@jcanny https://github.com/jcanny I will try, Please make sure Am I right or not? 1, Claim the Version Number 1.0.0 6.5 2, Add into Dependencies

bidmat?

${bidmat.version}

jcuda? jcuda? ${/jcuda.version}

HDF5-Java? HDF5-Java? ???

LZ4? LZ4? ???

【?】 means I am not sure, could you please kindly confirm it?

Tks.

— Reply to this email directly or view it on GitHub https://github.com/BIDData/BIDMat/issues/22#issuecomment-83185474.

jcanny commented 8 years ago

I have a team working on this now, so it should be resolved over the next few months.

RoiViber commented 7 years ago

@jcanny Hi. Any resolution on that? do you have open source code for coupling BidMat with Spark? thanks.