cloudml / zen

Zen aims to provide the largest scale and the most efficient machine learning platform on top of Spark, including but not limited to logistic regression, latent dirichilet allocation, factorization machines and DNN.
Apache License 2.0
170 stars 75 forks source link

Documentation to /run/ examples? #68

Open cmacdonald opened 7 years ago

cmacdonald commented 7 years ago

Hello,

Could you provide some documentation to run the examples - I'm having issues compiling, as mvn install is disabled, its difficult to do a mvn exec:java from inside the examples directory. Is there a way to get a single shaded Jar for all sub-modules? The jar created in assembly/target does not include the examples, and with install disabled I'm not clear how to run an example.

Any advice?

Craig

witgo commented 7 years ago

@cmacdonald Run the command mvn install clean package -DskipTests to generate a single shaded Jar for all sub-modules in examples/target/scala-2.*/zen-examples-*.jar

cmacdonald commented 7 years ago

I'm still not able to compile: I get an OutOfMemory error from Maven. The end of the Maven log file is enclosed below.

[DEBUG] Recompiling all 12 sources: invalidated sources (12) exceeded 50.0% of all sources
[INFO] Compiling 12 Scala sources to /private/tmp/zen/ml/target/scala-2.11/test-classes...
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Zen Project Parent POM ............................. SUCCESS [  3.536 s]
[INFO] Zen Project ML Library ............................. FAILURE [02:24 min]
[INFO] Zen Project Assembly ............................... SKIPPED
[INFO] Zen Project Examples ............................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 02:28 min
[INFO] Finished at: 2017-02-07T17:55:59+00:00
[INFO] Final Memory: 33M/1155M
[INFO] ------------------------------------------------------------------------
[ERROR] PermGen space -> [Help 1]
java.lang.OutOfMemoryError: PermGen space
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at scala.collection.immutable.Map$Map2.updated(Map.scala:130)
    at scala.collection.immutable.Map$Map2.$plus(Map.scala:131)
    at scala.collection.immutable.Map$Map2.$plus(Map.scala:120)
    at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:29)
    at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:25)
    at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59)
    at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
    at scala.collection.mutable.MapBuilder.$plus$plus$eq(MapBuilder.scala:25)
    at scala.collection.generic.GenMapFactory.apply(GenMapFactory.scala:48)
    at scala.sys.package$.env(package.scala:61)
    at scala.tools.nsc.settings.ScalaSettings$class.defaultClasspath(ScalaSettings.scala:33)
    at scala.tools.nsc.settings.MutableSettings.defaultClasspath(MutableSettings.scala:19)
    at scala.tools.nsc.settings.ScalaSettings$class.$init$(ScalaSettings.scala:66)
    at scala.tools.nsc.settings.MutableSettings.<init>(MutableSettings.scala:20)
    at scala.tools.nsc.Settings.<init>(Settings.scala:12)
    at xsbt.CachedCompiler0.<init>(CompilerInterface.scala:71)
    at xsbt.CompilerInterface.newCompiler(CompilerInterface.scala:24)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] 

I have tried to increase the memory settings in the pom.xml file:

diff --git a/pom.xml b/pom.xml
index ae634ac..821ba49 100644
--- a/pom.xml
+++ b/pom.xml
@@ -66,9 +66,9 @@
     <scala.version>2.11.8</scala.version>
     <scala.binary.version>2.11</scala.binary.version>
     <commons.math3.version>3.4.1</commons.math3.version>
-    <PermGen>64m</PermGen>
-    <MaxPermGen>512m</MaxPermGen>
-    <CodeCacheSize>512m</CodeCacheSize>
+    <PermGen>512m</PermGen>
+    <MaxPermGen>1404m</MaxPermGen>
+    <CodeCacheSize>1024m</CodeCacheSize>
     <zen.test.home>${session.executionRootDirectory}</zen.test.home>
   </properties>
   <prerequisites>

Any other hints?

witgo commented 7 years ago

What version of Maven are you using? You can try to add environment variables:

export MAVEN_OPTS="--Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"

cmacdonald commented 7 years ago

Maven version 3.3.9 - yes changing MAVEN_OPTS helped me compile the jar files.

However, to run the example, I had to change the parent's pom.xml file again which had disabled installation of jar files:

diff --git a/pom.xml b/pom.xml
index ae634ac..9375f8a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -66,9 +66,9 @@
     <scala.version>2.11.8</scala.version>
     <scala.binary.version>2.11</scala.binary.version>
     <commons.math3.version>3.4.1</commons.math3.version>
-    <PermGen>64m</PermGen>
-    <MaxPermGen>512m</MaxPermGen>
-    <CodeCacheSize>512m</CodeCacheSize>
+    <PermGen>512m</PermGen>
+    <MaxPermGen>1404m</MaxPermGen>
+    <CodeCacheSize>1024m</CodeCacheSize>
     <zen.test.home>${session.executionRootDirectory}</zen.test.home>
   </properties>
   <prerequisites>
@@ -277,14 +277,12 @@
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-deploy-plugin</artifactId>
           <configuration>
-            <skip>true</skip>
           </configuration>
         </plugin>
         <plugin>
           <groupId>org.apache.maven.plugins</groupId>
           <artifactId>maven-install-plugin</artifactId>
           <configuration>
-            <skip>true</skip>
           </configuration>
         </plugin>
         <plugin>

So after that diff, the final invocations to get something to run is:

MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m" mvn clean install -DskipTests
cd example
mvn scala:run -DmainClass=com.github.cloudml.zen.examples.ml.LambdaMARTRunner

What would now help is if you can provide an example of the options to use LambdaMARTRunner, e.g. using the LETOR or MSLR datasets, or at least what formats are expected, so I know how to transform such standard datasets?

Craig