ensozos / Matrix-Profile

A Java library for Matrix Profile
https://ensozos.github.io/Matrix-Profile/
MIT License
19 stars 7 forks source link

Running Matrix-Profile inside spark on Windows #10

Closed barrybecker4 closed 5 years ago

barrybecker4 commented 5 years ago

I have a spark project on linux where I was able to add code that depended on Matrix-Profile by adding

"io.github.ensozos" % "matrix-profile" % "0.0.3" % "provided"

to the build.sbt file. I used "provided" so that the huge number of c++ library dependencies required by Matrix-Profile would not be included in the jar that I deploy to spark-jobserver. Spark still needs those dependencies, so I need to add them to the spark-submit command that is used to launch spark using the --packge option. This is what I added:

--packages "io.github.ensozos:matrix-profile:0.0.3,org.nd4j:nd4j-native-platform:1.0.0-beta2"

Then when I run the spark-submit command to start spark-jobserver (using server_start.bat) on windows, it downloads a lot off dependencies that look correct. Here is a sampling of what I see in the console:

:: loading settings :: url = jar:file:/C:/apps/spark-2.3.0-bin-windows/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
io.github.ensozos#matrix-profile added as a dependency
org.nd4j#nd4j-native-platform added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
        found io.github.ensozos#matrix-profile;0.0.3 in central
        found org.nd4j#nd4j-native-platform;1.0.0-beta2 in central
        found org.bytedeco.javacpp-presets#openblas-platform;0.3.0-1.4.2 in central
        found org.bytedeco.javacpp-presets#openblas;0.3.0-1.4.2 in central
        found org.bytedeco#javacpp;1.4.2 in central
        found org.bytedeco.javacpp-presets#mkl-platform;2018.3-1.4.2 in central
        found org.bytedeco.javacpp-presets#mkl;2018.3-1.4.2 in central
        found org.bytedeco.javacpp-presets#mkl-dnn-platform;0.15-1.4.2 in central
        found org.bytedeco.javacpp-presets#mkl-dnn;0.15-1.4.2 in central
        found org.nd4j#nd4j-native;1.0.0-beta2 in central
        found org.nd4j#nd4j-native-api;1.0.0-beta2 in central
        found org.nd4j#nd4j-buffer;1.0.0-beta2 in central
        found org.nd4j#nd4j-context;1.0.0-beta2 in central
        found org.nd4j#nd4j-common;1.0.0-beta2 in central
        found org.nd4j#jackson;1.0.0-beta2 in central
        found org.yaml#snakeyaml;1.12 in central
        found org.codehaus.woodstox#stax2-api;3.1.4 in central
        found joda-time#joda-time;2.2 in central
        found org.slf4j#slf4j-api;1.7.21 in central
        found commons-io#commons-io;2.5 in central
        found org.apache.commons#commons-math3;3.5 in central
        found org.apache.commons#commons-lang3;3.6 in central
        found org.apache.commons#commons-compress;1.16.1 in central
        found org.objenesis#objenesis;2.6 in central
        found com.google.guava#guava;20.0 in central
        found commons-codec#commons-codec;1.10 in local-m2-cache
        found org.nd4j#nd4j-api;1.0.0-beta2 in central
        found com.vlkan#flatbuffers;1.2.0-3f79e055 in central
        found com.github.os72#protobuf-java-shaded-351;0.9 in central
        found com.github.os72#protobuf-java-util-shaded-351;0.9 in central
        found com.google.code.gson#gson;2.7 in central
        found uk.com.robust-it#cloning;1.9.3 in central
        found net.ericaro#neoitertools;1.0.0 in central
        found com.github.wendykierp#JTransforms;3.1 in central
        found pl.edu.icm#JLargeArrays;1.5 in central
downloading https://repo1.maven.org/maven2/io/github/ensozos/matrix-profile/0.0.3/matrix-profile-0.0.3.jar ...
        [SUCCESSFUL ] io.github.ensozos#matrix-profile;0.0.3!matrix-profile.jar (103ms)
:
:
downloading https://repo1.maven.org/maven2/pl/edu/icm/JLargeArrays/1.5/JLargeArrays-1.5.jar ...
        [SUCCESSFUL ] pl.edu.icm#JLargeArrays;1.5!JLargeArrays.jar (70ms)
:: resolution report :: resolve 20313ms :: artifacts dl 15756ms
        :: modules in use:
        com.github.os72#protobuf-java-shaded-351;0.9 from central in [default]
        com.github.os72#protobuf-java-util-shaded-351;0.9 from central in [default]
        com.github.wendykierp#JTransforms;3.1 from central in [default]
        com.google.code.gson#gson;2.7 from central in [default]
        com.google.guava#guava;20.0 from central in [default]
        com.vlkan#flatbuffers;1.2.0-3f79e055 from central in [default]
        commons-codec#commons-codec;1.10 from local-m2-cache in [default]
        commons-io#commons-io;2.5 from central in [default]
        io.github.ensozos#matrix-profile;0.0.3 from central in [default]
        joda-time#joda-time;2.2 from central in [default]
        net.ericaro#neoitertools;1.0.0 from central in [default]
        org.apache.commons#commons-compress;1.16.1 from central in [default]
        org.apache.commons#commons-lang3;3.6 from central in [default]
        org.apache.commons#commons-math3;3.5 from central in [default]
        org.bytedeco#javacpp;1.4.2 from central in [default]
        org.bytedeco.javacpp-presets#mkl;2018.3-1.4.2 from central in [default]
        org.bytedeco.javacpp-presets#mkl-dnn;0.15-1.4.2 from central in [default]
        org.bytedeco.javacpp-presets#mkl-dnn-platform;0.15-1.4.2 from central in [default]
        org.bytedeco.javacpp-presets#mkl-platform;2018.3-1.4.2 from central in [default]
        org.bytedeco.javacpp-presets#openblas;0.3.0-1.4.2 from central in [default]
        org.bytedeco.javacpp-presets#openblas-platform;0.3.0-1.4.2 from central in [default]
        org.codehaus.woodstox#stax2-api;3.1.4 from central in [default]
        org.nd4j#jackson;1.0.0-beta2 from central in [default]
        org.nd4j#nd4j-api;1.0.0-beta2 from central in [default]
        org.nd4j#nd4j-buffer;1.0.0-beta2 from central in [default]
        org.nd4j#nd4j-common;1.0.0-beta2 from central in [default]
        org.nd4j#nd4j-context;1.0.0-beta2 from central in [default]
        org.nd4j#nd4j-native;1.0.0-beta2 from central in [default]
        org.nd4j#nd4j-native-api;1.0.0-beta2 from central in [default]
        org.nd4j#nd4j-native-platform;1.0.0-beta2 from central in [default]
        org.objenesis#objenesis;2.6 from central in [default]
        org.slf4j#slf4j-api;1.7.21 from central in [default]
        org.yaml#snakeyaml;1.12 from central in [default]
        pl.edu.icm#JLargeArrays;1.5 from central in [default]
        uk.com.robust-it#cloning;1.9.3 from central in [default]
        :: evicted modules:
        org.objenesis#objenesis;2.1 by [org.objenesis#objenesis;2.6] in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   36  |   35  |   35  |   1   ||   36  |   36  |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent
        confs: [default]
        36 artifacts copied, 0 already retrieved (81691kB/453ms)
[2018-11-26 07:37:35,849] WARN  doop.util.NativeCodeLoader [] [] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: Local jar C:\Users\BBE\.ivy2\jars\org.nd4j_nd4j-native-1.0.0-beta2.jar does not exist, skipping.
Warning: Local jar C:\Users\BBE\.ivy2\jars\org.bytedeco.javacpp-presets_mkl-2018.3-1.4.2.jar does not exist, skipping.
Warning: Local jar C:\Users\BBE\.ivy2\jars\org.bytedeco.javacpp-presets_mkl-dnn-0.15-1.4.2.jar does not exist, skipping.

However, when I try to run anything through sob-server, I get errors like this:

[2018-11-26 07:41:36,286] ERROR .apache.spark.SparkContext [] [akka://JobServer/user/context-supervisor/sql-context] - Failed to add file:/C:/Users/BBE/.ivy2/jars/org.bytedeco.javacpp-presets_mkl-2018.3-1.4.2.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BBE\.ivy2\jars\org.bytedeco.javacpp-presets_mkl-2018.3-1.4.2.jar not found
        at org.apache.spark.SparkContext.addJarFile$1(SparkContext.scala:1807)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1837)
        at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:457)
        :
file:/C:/Users/BBE/.ivy2/jars/org.bytedeco.javacpp-presets_mkl-dnn-0.15-1.4.2.jar to Spark environment
java.io.FileNotFoundException: Jar C:\Users\BBE\.ivy2\jars\org.bytedeco.javacpp-presets_mkl-dnn-0.15-1.4.2.jar not found
        at org.apache.spark.SparkContext.addJarFile$1(SparkContext.scala:1807)
        at org.apache.spark.SparkContext.addJar(SparkContext.scala:1837)
        at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:457)
        at org.apache.spark.SparkContext$$anonfun$12.apply(SparkContext.scala:457)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at org.apache.spark.SparkContext.<init>(SparkContext.scala:457)
        at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2486)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
        at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
        at spark.jobserver.context.SessionContextFactory.makeContext(SessionContextFactory.scala:36)
        at spark.jobserver.context.SessionContextFactory.makeContext(SessionContextFactory.scala:23)
        at spark.jobserver.context.SparkContextFactory$class.makeContext(SparkContextFactory.scala:64)
        :  

When I look in the .ivy2\jars directory, I do see what look to be the correct windows versions of these things:

org.bytedeco.javacpp-presets_mkl-2018.3-1.4.2-windows-x86_64.jar
org.bytedeco.javacpp-presets_mkl-dnn-0.15-1.4.2-windows-x86_64.jar
org.bytedeco.javacpp-presets_openblas-0.3.0-1.4.2-windows-x86_64.jar
:

But that is not what it appears to be looking for in the above errors. Is there a way to get it to look for the correct platform specific version of these jars? Maybe I need to compile the jar on windows instead of linux before deploying it. In the past it always worked fine to build on linux and deploy to windows, but maybe these c++ depedencies change that.

barrybecker4 commented 5 years ago

I built my spark/scala code (which depends on Matrix-Profile) on windows.

When I start jobserver with spark-submit ... --packages "io.github.ensozos:matrix-profile:0.0.3,org.nd4j:nd4j-native-platform:1.0.0-beta2" I get this error

org.apache.spark#spark-submit-parent: java.lang.RuntimeException:
   Multiple artifacts of the module org.bytedeco.javacpp-presets#openblas;0.3.0-1.4.2 are retrieved to the same file! 
   Update the retrieve pattern  to fix this error.

But when I run with --packages "io.github.ensozos:matrix-profile:0.0.3,org.nd4j:nd4j-native-platform:1.0.0-beta3" It does not give the above error, but when I try to execute code that uses Matrix-Profile, I get an error like this:

ND4J is probably missing dependencies. For more information, please refer to: http://nd4j.org/getstarted.html
      java.lang.UnsatisfiedLinkError: no nd4jcpu in java.library.path at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)

Maybe we need to compile Matrix-Profile with nd4j-native-platform:1.0.0-beta3 instead of nd4j-native-platform:1.0.0-beta2.

barrybecker4 commented 5 years ago

I built Matrix-Profile with nd4j-native-platform:1.0.0-beta3, deployed to our own JFrog artifactory, and loaded this version into spark-jobserver at startup, but when I execute code that uses it, I still get the unsatisfied link error.

Caused by: java.lang.RuntimeException: ND4J is probably missing dependencies. For more information, please refer to: http://nd4j.org/getstarted.html
        at org.nd4j.nativeblas.NativeOpsHolder.<init>(NativeOpsHolder.java:68)
        at org.nd4j.nativeblas.NativeOpsHolder.<clinit>(NativeOpsHolder.java:36)
        ... 43 more
Caused by: java.lang.UnsatisfiedLinkError: no jnind4jcpu in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1258)
        at org.bytedeco.javacpp.Loader.load(Loader.java:999)
        at org.bytedeco.javacpp.Loader.load(Loader.java:891)
        at org.nd4j.nativeblas.Nd4jCpu.<clinit>(Nd4jCpu.java:10)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.bytedeco.javacpp.Loader.load(Loader.java:950)
        at org.bytedeco.javacpp.Loader.load(Loader.java:891)
        at org.nd4j.nativeblas.Nd4jCpu$NativeOps.<clinit>(Nd4jCpu.java:1613)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at org.nd4j.nativeblas.NativeOpsHolder.<init>(NativeOpsHolder.java:46)
        ... 44 more
Caused by: java.lang.UnsatisfiedLinkError: no nd4jcpu in java.library.path
        at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1867)
        at java.lang.Runtime.loadLibrary0(Runtime.java:870)
        at java.lang.System.loadLibrary(System.java:1122)
        at org.bytedeco.javacpp.Loader.loadLibrary(Loader.java:1258)
        at org.bytedeco.javacpp.Loader.load(Loader.java:977)
barrybecker4 commented 5 years ago

These links may be useful for debugging https://github.com/bytedeco/javacpp-presets/wiki/Debugging-UnsatisfiedLinkError-on-Windows#using-dependency-walker https://www.chilkatsoft.com/java-loadlibrary-windows.asp

ensozos commented 5 years ago

Is this a problem with uber jar that is not included? If so use gradle shade plugin for uber jar generation.

barrybecker4 commented 5 years ago

Maybe. I added the shade plugin and am running it now. It's taking a really long time because first it builds a huge jar (like half a gigabyte), then its trying to upload it to the JFrog artifactory over my slow connection.

I thought that by adding org.nd4j:nd4j-native-platform to the list of packages like this --packages "io.github.ensozos:matrix-profile:0.0.4-mineset,org.nd4j:nd4j-native-platform:1.0.0-beta3" in the spark-submit script, that it would retrieve all the required native libraries. They all seem to be there in .ivy2, but maybe they are not on the path. I added .../BBE/.ivy2/jars to my path and restarted, but still get the unsatisfied link error. Maybe I need to use dependency-walker to find out exactly what its not getting.

barrybecker4 commented 5 years ago

I got the shaded jar uploaded to the artifactory. It appears next to the smaller one with -all.jar at the end. The "all" is the name of the classifier. Unfortunately due to https://issues.apache.org/jira/browse/SPARK-20075 spark --packages option does not allow specifying a classifier. This will probably require an ugly workaround of using a separate version name.

barrybecker4 commented 5 years ago

Using the shaded jar did finally allow it to work. I still need to work out a clean way to deploy it with spark and jobserver, but at least I'm no longer blocked on this. Thanks.