apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.75k stars 6.8k forks source link

[RFC] MXNet 2.0 JVM Language development #17783

Open lanking520 opened 4 years ago

lanking520 commented 4 years ago

Since MXNet 2.0 development starts. I would like to initiate a discussion for the future development of JVM languages.

Proposal

  1. Start cleaning on the existing APIs to adapt to 2.0
  2. Start from ground up to rewrite the whole Scala/Java APIs
  3. Start using DJL (djl.ai) as a frontend for MXnet JVM development
  4. Using DJL MXNet JNA as the low level API
  5. Use MXNet JavaCpp as the low level API
  6. (Feel free to add more...)

Statistics

Scala package

scala-mxnet

Clojure package

clojure-mxnet-downloads

@gigasquid @terrytangyuan @zachgk @frankfliu @aaronmarkham

lanking520 commented 4 years ago

I would propose Option 3. and 4.

DJL is a new Java framework that build on top of any engines. It brings the ease for Java developers to have close to numpy experience in Java. It introduced an interface that defined Java to train and run inference on different ML/DL models.

In the engine layer, we implemented the MXNet specific engine that allows users to achieve most of the up-to-date functionalities:

MXNet specific

With the benefit listed above, I would recommend Option 3 to go for the DJL path since it already covered most up-to-date MXNet feature and supporting all different symbolic/imperative training/inference.

For Option 4: I am also thinking of bring our JNA layer back to MXNet so the community can build up their own Java/Scala frontend if they don't like DJL.

terrytangyuan commented 4 years ago

I propose option 1 and 2 since it took us a lot of efforts to bring MXNet to Scala originally and there are already adopters of Scala API in industries (some may not have been disclosed yet). But I am open to other options. Not familiar with DJL though but I assume @frankfliu and @lanking520 are the masters behind DJL.

gigasquid commented 4 years ago

@lanking520 thanks for the clarification above. A further question - How do you envision a current Scala MXNet user migrate their code? Is it going to be mostly reusable or is it going to be a complete rewrite for them?

zachgk commented 4 years ago

It is going to be closer to a complete rewrite. On the other hand, making a new Scala API would be imperative instead of symbolic and I think there are going to be a lot of operator changes to better match numpy in 2.0. I don't think the migration costs for a Scala 2.0 would be that much less anyway

For users who don't want a full rewrite, they can continue using an old release or whatever new releases we make on the v1.x branch.

gigasquid commented 4 years ago

For the Clojure package. It is a lot easier to interop with Java than with Scala - so if the the base is Java that everything is using - it will be better for Clojure.

szha commented 4 years ago

+1 for option 1 and 2. Also +1 for 4 as long as it doesn't add a dependency

My concerns on 3 and 4 are that DJL is a separate project which has its own release cycle. Having it to support MXNet's inference will cause delays as DJL upgrades to the latest version. This will also complicate the testing and validation.

Overall, I think a minimum set of API for at least inference is needed for MXNet JVM ecosystem users.

leezu commented 4 years ago

Another data point is that all of our Scala tests fail randomly with src/c_api/c_api_profile.cc:141: Check failed: !thread_profiling_data.calls_.empty():, so there seem to be some underlying issues.

https://github.com/apache/incubator-mxnet/issues/17067

leezu commented 4 years ago

Another data point is that we currently only support OpenJDK 8 but the JVM languages are broken with OpenJDK 11 which is used on Ubuntu 18.04 for example. See https://github.com/apache/incubator-mxnet/issues/18153

lanking520 commented 4 years ago

@szha For option 4, I would recommend to consume the JNA layer as a submodule from DJL. I am not sure if this is recommendation serves as "add a dependency in mxnet".

There are two key reason that support for that:

  1. DJL moves really fast and we can quickly change the JNA layer whenever in need. Comparing to the merging speed in MXNet.

  2. Consume as a submodule means MXNet community don't have to take care much on the maintainance. DJL team will regularly provide Jar for MXNet user to consume.

We can also contribute code back in MXNet repo, since it is open source. But we may still keep a copy in our repo for fast iteration. It may cause diverged version on JNA layer.

Overally speaking, my recommendation on option 4 leads towards a direction to consume DJL JNA as a submodule.

szha commented 4 years ago

@lanking520 would it create circular dependency? and how stable is the JNA and what changes are expected? it would be great if you could share a pointer on the JNA code to help clarify these concerns.

lanking520 commented 4 years ago

There is no code for JNA, everything is generated. It ensure the general standard and minimum layer in C to avoid error and mistakes.

About JNA, you can find more information here: jnarator. We build an entire project for the jna generation pipeline. All we need is a header file from MXNet to build everything. The dependency required by the gradle build is minimum, as you can fine in here.

To address the concern of stability, we tested DJL MXNet with 100 hour inference run on server and it remains stable. Training experience is also smooth, multi-gpu run 48 hours is also stable. The performance is very close to python with large models and may bring huge boost if model is smaller or equals to "squeezenet level".

@frankfliu can bring more information about the JNA layer.

szha commented 4 years ago

My understanding is that DJL depends on MXNet, so if you want to bring JNA from DJL into MXNet, it will create circular dependency as a 3rdparty module. In terms of stability, I was referring to the development of code base rather than the performance.

saudet commented 4 years ago

Hi, instead of JNA, I would be happy to provide bindings for the C API and maintain packages based on the JavaCPP Presets here: https://github.com/bytedeco/javacpp-presets/tree/master/mxnet JavaCPP adds no overhead, unlike JNA, and is often faster than manually written JNI. Plus JavaCPP provides more tools than JNA to automate the process of parsing header files as well as packaging native libraries in JAR files. I have been maintaining modules for TensorFlow based on JavaCPP, and we actually got a boost in performance when compared to the original JNI code: https://github.com/tensorflow/java/pull/18#issuecomment-579600568 I would be able to do the same for MXNet and maintain the result in a repository of your choice. Let me know if this sounds interesting! BTW, the developers of DJL also seem opened to switch from JNA to JavaCPP even though it is not a huge priority. Still, standardizing how native bindings are created and loaded with other libraries for which JavaCPP is pretty much already the standard (such as OpenCV, TensorFlow, CUDA, FFmpeg, LLVM, Tesseract) could go a long way in alleviating concerns of stability.

szha commented 4 years ago

@saudet this looks awesome! An 18% improvement in throughput is quite significant for switching the way of integration for a frontend binding. I think we should definitely start with this offering. @lanking520 @gigasquid what do you think?

gigasquid commented 4 years ago

@saudet @szha - I think we be a good path forward (from the Clojure perspective)

lanking520 commented 4 years ago

@saudet Thanks for your proposal. I have four questions would like to ask you:

  1. If we adopt JavaCpp package, how will that be consumed? Under byteco or apache MXNet? Essentially from our previous discussion, we really don't want another 3rdparty checkin.

  2. Can you also do a benchmark on the MXNet's API's performance and possibly share the reproducible code? We did test the performance on JavaCpp vs JNA vs JNI and didn't see much difference on performance (under 10%).

The above two methods are most frequently used methods in order to do minimum inference request, please try on these two to see how performance goes.

  1. We do have some additional technical issue with JavaCpp, is there any plan to fix it? (I will put it into a separate comment since it is really big.

  2. How do you ensure the performance if the build flag is different? Like the mxnet has to build from source (with necessary modification on source code) in order to work along with javacpp

  3. regarding to the dependencies issue, can we go without additional opencv and openblas in the package?

lanking520 commented 4 years ago

What's inside of javacpp-presets-mxnet

What's missing

javacpp-presets-mxnet doesn't expose APIs form nnvm/c_api.h (some of current python/gluon API depends on APIs in nnvm/c_api.h)

What's the dependencies

org.bytedeco.mxnet:ImageClassificationPredict:jar:1.5-SNAPSHOT
+- org.bytedeco:mxnet-platform:jar:1.4.0-1.5-SNAPSHOT:compile
|  +- org.bytedeco:opencv-platform:jar:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:android-arm:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:android-arm64:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:android-x86:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:android-x86_64:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:ios-arm64:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:ios-x86_64:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:linux-x86:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:linux-x86_64:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:linux-armhf:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:linux-ppc64le:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:macosx-x86_64:4.0.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:opencv:jar:windows-x86:4.0.1-1.5-SNAPSHOT:compile
|  |  \- org.bytedeco:opencv:jar:windows-x86_64:4.0.1-1.5-SNAPSHOT:compile
|  +- org.bytedeco:openblas-platform:jar:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:android-arm:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:android-arm64:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:android-x86:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:android-x86_64:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:ios-arm64:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:ios-x86_64:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:linux-x86:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:linux-x86_64:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:linux-armhf:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:linux-ppc64le:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:macosx-x86_64:0.3.5-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:openblas:jar:windows-x86:0.3.5-1.5-SNAPSHOT:compile
|  |  \- org.bytedeco:openblas:jar:windows-x86_64:0.3.5-1.5-SNAPSHOT:compile
|  +- org.bytedeco:mkl-dnn-platform:jar:0.18.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:mkl-dnn:jar:linux-x86_64:0.18.1-1.5-SNAPSHOT:compile
|  |  +- org.bytedeco:mkl-dnn:jar:macosx-x86_64:0.18.1-1.5-SNAPSHOT:compile
|  |  \- org.bytedeco:mkl-dnn:jar:windows-x86_64:0.18.1-1.5-SNAPSHOT:compile
|  \- org.bytedeco:mxnet:jar:1.4.0-1.5-SNAPSHOT:compile
\- org.bytedeco:mxnet:jar:macosx-x86_64:1.4.0-1.5-SNAPSHOT:compile
   +- org.bytedeco:opencv:jar:4.0.1-1.5-SNAPSHOT:compile
   +- org.bytedeco:openblas:jar:0.3.5-1.5-SNAPSHOT:compile
   +- org.bytedeco:mkl-dnn:jar:0.18.1-1.5-SNAPSHOT:compile
   +- org.bytedeco:javacpp:jar:1.5-SNAPSHOT:compile
   +- org.slf4j:slf4j-simple:jar:1.7.25:compile
   |  \- org.slf4j:slf4j-api:jar:1.7.25:compile
   \- org.scala-lang:scala-library:jar:2.11.12:compile

Build the project form source

I spent 40 min to build the project on my mac, and has to make some hack to build it.

Classes

See javadoc: http://bytedeco.org/javacpp-presets/mxnet/apidocs/

  1. Java class name is “mxnet”, which is not following java naming conventions
  2. Each pointer has a corresponding java class, which is arguable. It's necessary to expose them as strong type class if they meant to be used directly by end developer. But they really should only be internal implementation of the API. It's overkill to expose them as a Type instead of just a pointer.
  3. All the classes (except mxnet.java) are hand written.
  4. API mapping are hand coded as well.

Performance

JavaCPP native library load takes a long time, it takes average 2.6 seconds to initialize libmxnet.so with javacpp.

Loader.load(org.bytedeco.mxnet.global.mxnet.class);

Issues

The open source code on github doesn't match the binary release on maven central:

saudet commented 4 years ago

@saudet Thanks for your proposal. I have four questions would like to ask you:

  1. If we adopt JavaCpp package, how will that be consumed? Under byteco or apache MXNet? Essentially from our previous discussion, we really don't want another 3rdparty checkin.

We can go either way, but I found that for contemporary projects like Deeplearning4j, MXNet, PyTorch, or TensorFlow that need to develop high-level APIs on top of something like JavaCPP prefer to have control over everything in their own repositories, and use JavaCPP pretty much like we would use cython or pybind11 with setuptools for Python.

I started the JavaCPP Presets because for traditional projects such as OpenCV, FFmpeg, LLVM, etc, high-level APIs for other languages than C/C++ are not being developed as part of those projects. I also realized the Java community needed something like Anaconda...

  1. Can you also do a benchmark on the MXNet's API's performance and possibly share the reproducible code? We did test the performance on JavaCpp vs JNA vs JNI and didn't see much difference on performance (under 10%).

    • MXImperativeInvokeEx

    • CachedOpForward

The above two methods are most frequently used methods in order to do minimum inference request, please try on these two to see how performance goes.

If you're doing only batch operations, as would be the case for Python bindings, you're not going to see much difference, no. What you need to look at are things like the Indexer package, which allows us to implement fast custom operations in Java like this: http://bytedeco.org/news/2014/12/23/third-release/ You're not going to be able to do that with JNA or JNI without essentially rewriting that sort of thing.

  1. We do have some additional technical issue with JavaCpp, is there any plan to fix it? (I will put it into a separate comment since it is really big.

  2. How do you ensure the performance if the build flag is different? Like the mxnet has to build from source (with necessary modification on source code) in order to work along with javacpp

  3. regarding to the dependencies issue, can we go without additional opencv and openblas in the package?

Yes, that's the kind of issues that would be best dealt with by using only JavaCPP as a low-level tool, instead of the presets, which is basically a high-level distribution like Anaconda.

saudet commented 4 years ago

What's missing

javacpp-presets-mxnet doesn't expose APIs form nnvm/c_api.h (some of current python/gluon API depends on APIs in nnvm/c_api.h)

I've added that the other day, thanks to @frankfliu for pointing this out: https://github.com/bytedeco/javacpp-presets/commit/976e6f7d307b3f3855f39413c494d8f482c9adf6

See javadoc: http://bytedeco.org/javacpp-presets/mxnet/apidocs/

  1. Java class name is “mxnet”, which is not following java naming conventions

That's not hardcoded. We can use whatever name we want for that class.

  1. Each pointer has a corresponding java class, which is arguable. It's necessary to expose them as strong type class if they meant to be used directly by end developer. But they really should only be internal implementation of the API. It's overkill to expose them as a Type instead of just a pointer.

We can map everything to Pointer, that's not a problem either.

  1. All the classes (except mxnet.java) are hand written.

No, they are not. Everything in the src/gen directory here is generated at build time: https://github.com/bytedeco/javacpp-presets/tree/master/mxnet/src/gen/java/org/bytedeco/mxnet

  1. API mapping are hand coded as well.

If you're talking about this file, yes, that's the only thing that is written manually: https://github.com/bytedeco/javacpp-presets/blob/master/mxnet/src/main/java/org/bytedeco/mxnet/presets/mxnet.java (The formatting is a bit crappy, I haven't touched it in a while, but we can make it look prettier like this: https://github.com/bytedeco/javacpp-presets/blob/master/onnxruntime/src/main/java/org/bytedeco/onnxruntime/presets/onnxruntime.java )

Performance

JavaCPP native library load takes a long time, it takes average 2.6 seconds to initialize libmxnet.so with javacpp.

Loader.load(org.bytedeco.mxnet.global.mxnet.class);

Something's wrong, that takes less than 500 ms on my laptop, and that includes loading OpenBLAS, OpenCV, and a lookup for CUDA and MKL, which can obviously be optimized... In any case, we can debug that later to see what is going wrong on your end.

Issues

The open source code on github doesn't match the binary release on maven central:

  • the maven group and the java package name are different.

Both the group ID and the package names are org.bytedeco, but in any case, if that gets maintained somewhere here, I imagine it would be changed to something like org.apache.mxnet.xyz.internal.etc

  • c predict API is not included in maven version

Yes it is: http://bytedeco.org/javacpp-presets/mxnet/apidocs/org/bytedeco/mxnet/global/mxnet.html

  • Example code doesn't work with maven artifacts, it can only build with snapshot version locally.

https://github.com/bytedeco/javacpp-presets/tree/master/mxnet/samples works fine for me on Linux:

$ mvn -U clean compile exec:java -Djavacpp.platform.custom -Djavacpp.platform.host -Dexec.args=apple.jpg
...
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet-platform/1.7.0.rc1-1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet-platform/1.7.0.rc1-1.5.4-SNAPSHOT/maven-metadata.xml (1.3 kB at 2.5 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet-platform/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-platform-1.7.0.rc1-1.5.4-20200725.115300-20.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet-platform/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-platform-1.7.0.rc1-1.5.4-20200725.115300-20.pom (4.7 kB at 9.3 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-presets/1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-presets/1.5.4-SNAPSHOT/maven-metadata.xml (610 B at 1.5 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-presets/1.5.4-SNAPSHOT/javacpp-presets-1.5.4-20200725.155410-6590.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-presets/1.5.4-SNAPSHOT/javacpp-presets-1.5.4-20200725.155410-6590.pom (84 kB at 91 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv-platform/4.4.0-1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv-platform/4.4.0-1.5.4-SNAPSHOT/maven-metadata.xml (1.2 kB at 2.6 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv-platform/4.4.0-1.5.4-SNAPSHOT/opencv-platform-4.4.0-1.5.4-20200725.082627-40.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv-platform/4.4.0-1.5.4-SNAPSHOT/opencv-platform-4.4.0-1.5.4-20200725.082627-40.pom (7.9 kB at 19 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas-platform/0.3.10-1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas-platform/0.3.10-1.5.4-SNAPSHOT/maven-metadata.xml (1.3 kB at 2.1 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas-platform/0.3.10-1.5.4-SNAPSHOT/openblas-platform-0.3.10-1.5.4-20200724.193951-177.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas-platform/0.3.10-1.5.4-SNAPSHOT/openblas-platform-0.3.10-1.5.4-20200724.193951-177.pom (7.9 kB at 16 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-platform/1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-platform/1.5.4-SNAPSHOT/maven-metadata.xml (1.2 kB at 2.8 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-platform/1.5.4-SNAPSHOT/javacpp-platform-1.5.4-20200720.164410-35.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-platform/1.5.4-SNAPSHOT/javacpp-platform-1.5.4-20200720.164410-35.pom (60 kB at 112 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp/1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp/1.5.4-SNAPSHOT/maven-metadata.xml (4.3 kB at 8.1 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp/1.5.4-SNAPSHOT/javacpp-1.5.4-20200725.222627-485.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp/1.5.4-SNAPSHOT/javacpp-1.5.4-20200725.222627-485.pom (20 kB at 52 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas/0.3.10-1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas/0.3.10-1.5.4-SNAPSHOT/maven-metadata.xml (4.2 kB at 6.9 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas/0.3.10-1.5.4-SNAPSHOT/openblas-0.3.10-1.5.4-20200725.222937-191.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas/0.3.10-1.5.4-SNAPSHOT/openblas-0.3.10-1.5.4-20200725.222937-191.pom (4.8 kB at 9.9 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv/4.4.0-1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv/4.4.0-1.5.4-SNAPSHOT/maven-metadata.xml (4.6 kB at 9.7 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv/4.4.0-1.5.4-SNAPSHOT/opencv-4.4.0-1.5.4-20200725.222953-47.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv/4.4.0-1.5.4-SNAPSHOT/opencv-4.4.0-1.5.4-20200725.222953-47.pom (11 kB at 23 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/1.7.0.rc1-1.5.4-SNAPSHOT/maven-metadata.xml
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/1.7.0.rc1-1.5.4-SNAPSHOT/maven-metadata.xml (2.6 kB at 5.7 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-1.7.0.rc1-1.5.4-20200725.222844-30.pom
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-1.7.0.rc1-1.5.4-20200725.222844-30.pom (15 kB at 28 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet-platform/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-platform-1.7.0.rc1-1.5.4-20200725.115300-20.jar
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv/4.4.0-1.5.4-SNAPSHOT/opencv-4.4.0-1.5.4-20200725.222953-47-linux-x86_64.jar
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas-platform/0.3.10-1.5.4-SNAPSHOT/openblas-platform-0.3.10-1.5.4-20200724.193951-177.jar
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv/4.4.0-1.5.4-SNAPSHOT/opencv-4.4.0-1.5.4-20200725.222953-47.jar
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv-platform/4.4.0-1.5.4-SNAPSHOT/opencv-platform-4.4.0-1.5.4-20200725.082627-40.jar
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet-platform/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-platform-1.7.0.rc1-1.5.4-20200725.115300-20.jar (3.4 kB at 8.6 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-platform/1.5.4-SNAPSHOT/javacpp-platform-1.5.4-20200720.164410-35.jar
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp-platform/1.5.4-SNAPSHOT/javacpp-platform-1.5.4-20200720.164410-35.jar (6.1 kB at 7.7 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp/1.5.4-SNAPSHOT/javacpp-1.5.4-20200725.222627-485-linux-x86_64.jar
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv-platform/4.4.0-1.5.4-SNAPSHOT/opencv-platform-4.4.0-1.5.4-20200725.082627-40.jar (3.6 kB at 3.4 kB/s)
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas-platform/0.3.10-1.5.4-SNAPSHOT/openblas-platform-0.3.10-1.5.4-20200724.193951-177.jar (3.6 kB at 3.4 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas/0.3.10-1.5.4-SNAPSHOT/openblas-0.3.10-1.5.4-20200725.222937-191.jar
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas/0.3.10-1.5.4-SNAPSHOT/openblas-0.3.10-1.5.4-20200725.222937-191-linux-x86_64.jar
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp/1.5.4-SNAPSHOT/javacpp-1.5.4-20200725.222627-485-linux-x86_64.jar (25 kB at 21 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-1.7.0.rc1-1.5.4-20200725.222844-30.jar
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas/0.3.10-1.5.4-SNAPSHOT/openblas-0.3.10-1.5.4-20200725.222937-191.jar (170 kB at 76 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp/1.5.4-SNAPSHOT/javacpp-1.5.4-20200725.222627-485.jar
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/javacpp/1.5.4-SNAPSHOT/javacpp-1.5.4-20200725.222627-485.jar (467 kB at 151 kB/s)
Downloading from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-1.7.0.rc1-1.5.4-20200725.222844-30-linux-x86_64.jar
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv/4.4.0-1.5.4-SNAPSHOT/opencv-4.4.0-1.5.4-20200725.222953-47.jar (1.6 MB at 509 kB/s)
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-1.7.0.rc1-1.5.4-20200725.222844-30.jar (3.3 MB at 706 kB/s)
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/openblas/0.3.10-1.5.4-SNAPSHOT/openblas-0.3.10-1.5.4-20200725.222937-191-linux-x86_64.jar (14 MB at 1.7 MB/s)
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/1.7.0.rc1-1.5.4-SNAPSHOT/mxnet-1.7.0.rc1-1.5.4-20200725.222844-30-linux-x86_64.jar (44 MB at 2.0 MB/s)
Downloaded from sonatype-nexus-snapshots: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/opencv/4.4.0-1.5.4-SNAPSHOT/opencv-4.4.0-1.5.4-20200725.222953-47-linux-x86_64.jar (26 MB at 1.1 MB/s)
...
Best Result: Granny Smith (id=948, accuracy=0.96502399)
run successfully

What is the error that you're getting? I've also tested on Mac just now and still no problems.

lanking520 commented 4 years ago

@saudet Thanks for your reply. Still, I am concerned about the first question:

you mentioned:

We can go either way, but I found that for contemporary projects like Deeplearning4j, MXNet, PyTorch, or TensorFlow that > need to develop high-level APIs on top of something like JavaCPP prefer to have control over everything in their own repositories, and use JavaCPP pretty much like we would use cython or pybind11 with setuptools for Python.

We are looking for a robust solution for MXNet Java developers to use especially owned and maintained by the Apache MXNet's community. I will be more than happy to see if you would like to contribute the source code that generate MXNet JavaCpp package to this repo. So we can own the maintainance and responsible for the end users that the package is reliable.

At the beginning, we were discussing several ways that we can try to preserve a low level Java API for MXNet that anyone who use Java can start with. Most of the problems were lying under the ownership and maintainance part. I have placed JavaCpp option to option 5 so we can see which one works the best in the end.

terrytangyuan commented 4 years ago

This is great discussion. Thanks @lanking520 for initiating this. Perhaps we can define some key metrics here so we can compare the solutions later?

saudet commented 4 years ago

We are looking for a robust solution for MXNet Java developers to use especially owned and maintained by the Apache MXNet's community. I will be more than happy to see if you would like to contribute the source code that generate MXNet JavaCpp package to this repo. So we can own the maintainance and responsible for the end users that the package is reliable.

At the beginning, we were discussing several ways that we can try to preserve a low level Java API for MXNet that anyone who use Java can start with. Most of the problems were lying under the ownership and maintainance part. I have placed JavaCpp option to option 5 so we can see which one works the best in the end.

Sounds good, thanks! If you have any specific concerns about the above, please let me know. JNA seems to be maintained by a single person with apparently no connections to the AI industry (https://dzone.com/articles/scratch-netbeans-itch-matthias) whereas I have to maintain anyway as part of my work APIs mainly for OpenCV, FFmpeg, ONNX Runtime, and TensorFlow at the moment, but others as well and it tends to vary with time, MXNet could become part of those eventually, and I have users paying for commercial support of proprietary libraries too, so I think JavaCPP is the better option here, but I'm obviously biased. :)

hmf commented 3 years ago

I would like to add a my two cents.

In regards to the language I am partial to Scala because:

However, Scala comes with some issues:

So the use of a more imperative (as opposed to functional) programing style may be the way to go (simpler code lost and lets not get into the issues on composability :-) ). If this is the route, then maybe a Java interface will suffice for Scala users. Care must be taken to ensure easy interoperability . One may also add a thin wrapper to promote idiomatic a Scala programming style (possibly maximize type inference). For an example of such a case see ScalaFX.

Having said that, I am still partial to Scala. However, one more issue should be highlighted. The new Scala 3 is poised for release by the end of this year. I have been using it and it is quite stable. It also allows interoperability with at least Scala 2.12.x and 2.13.x. Type inference seems to be more intuitive, some constructs have been removed to simplify the language, it allows for coding without braces (like Python), meta-programming is much improved, faster compilation, etc.

So my suggestion is that if Scala is used (full blown API or wrapper), consider using Scala 3.

Hope this is useful.

lanking520 commented 3 years ago

Hi all, after talking with @frankfliu, I think we can donate DJL's MXNet JNA (source code) to Apache MXNet for the low level Java support. Having that, Apache MXNet will be able to build from source generating all frontend low level Java API for users to use. The good part is it's maintainance cost, it require very minimal dependencies and everything is generated:

https://github.com/awslabs/djl/tree/master/mxnet/jnarator

User will be able to leverage on that directly.

From the usability point of view, JavaCPP will also offer similar funcionalities. @saudet will you be open to consider if MXNet JavaCPP can be fully donated to Apache MXNet and maintained by community?

These two solution is very similar and potentially JavaCPP can bring some level of performance improvement. My only concern to JavaCPP is @szha initial question, how we are going to build/maintain the Java package.

Having a Java API is essential to the community since it can be used by many more JVM based languages. @hmf I would +1 on the usability you have mentioned for Scala. However, from the historical reason when we would like to build Java compatible API from MXNet Scala, there are huge obstacles blocking us moving forward. Some Scala representation require addtional effort to be used in Java. However if using Java in Scala, it is 100% supported.

saudet commented 3 years ago

From the usability point of view, JavaCPP will also offer similar funcionalities. @saudet will you be open to consider if MXNet JavaCPP can be fully donated to Apache MXNet and maintained by community?

Yes, that's what I'm offering. I would sign up as a contributor to make an initial contribution and also do what needs to be done to get it working for your purposes.

These two solution is very similar and potentially JavaCPP can bring some level of performance improvement. My only concern to JavaCPP is @szha initial question, how we are going to build/maintain the Java package.

I'm sorry, I must have missed that question, and I'm not sure I see it above. Can you point it out just to be sure? I've since released a Gradle plugin, and I think that answers the question though. Here is an example that builds and packages bindings for zlib:

Just pop that in a script for GitHub Actions and what not and it works. The build.sh script does not need to build the libraries from scratch either. It can be modified to download and extract existing binaries, for example, like it's doing in the case of MKLML here:

Does this look satisfactory?

lanking520 commented 3 years ago

@saudet Sounds good to me.

So can I say that all source code need to build will be inside Apache MXNet? We may only import some other Java packages that are necessary to build instead of consume as submodule.

However, during MXNet 1.x, we are facing one key issue: we have to separate native binary with Java package due to license issue. As far as I know, JavaCPP require both of them build together, can we separate native binary with JavaCPP when we do the publish work and make it usable for user provide their own native binary (e.g grab from pip wheel)? This requirement typically came from GPU use cases, we have different GPU distribution for different Arch/OS (CU101/CU102/CU110).

To make it fully usable, there will be some additional tasks for JavaCPP such as type-mapping to form all Java object into JavaCPP pointer. Will we have similar stuff for MXNet?

saudet commented 3 years ago

So can I say that all source code need to build will be inside Apache MXNet? We may only import some other Java packages that are necessary to build instead of consume as submodule.

I'm not sure I follow. Are you saying that Apache MXNet is not allowed to have dependencies? I see plenty of dependencies being downloaded and compiled when building MXNet. If you're saying JavaCPP cannot be one of those, why is that? If there is a reason that we can't go through Maven Central, we could have https://github.com/bytedeco/javacpp added as a git submodule here. Would that be acceptable?

However, during MXNet 1.x, we are facing one key issue: we have to separate native binary with Java package due to license issue. As far as I know, JavaCPP require both of them build together, can we separate native binary with JavaCPP when we do the publish work and make it usable for user provide their own native binary (e.g grab from pip wheel)? This requirement typically came from GPU use cases, we have different GPU distribution for different Arch/OS (CU101/CU102/CU110).

Sure, JavaCPP will load from the the system path if it can't find what it needs from the class path. From the perspective of the build system bundling in JAR files, we can control that with the copyLibs flag like this here: https://github.com/bytedeco/gradle-javacpp/blob/master/samples/zlib/build.gradle#L42 Setting it to false will prevent it from bundling any library it links with or needs to preload.

To make it fully usable, there will be some additional tasks for JavaCPP such as type-mapping to form all Java object into JavaCPP pointer. Will we have similar stuff for MXNet?

Yes, we can map everything to Pointer, that's not a problem. I could start to put something in a fork, I imagine in a java subdirectory similar to https://github.com/apache/incubator-mxnet/tree/master/python?

hmf commented 3 years ago

@lanking520 In regards to the Scala API, access via Java is just fine. I am sure someone with the itch may end up providing a Scala wrapper 8-)

lanking520 commented 3 years ago

@saudet if it is a maven package consumption should be fine as long as the license isn't fall under (no license, GPL, LGPL or some license that ASF doesn't approve). I would +1 to the solution you have mentioned in JavaCPP. One last question is the maintainance cost, since JavaCPP is doing the generation work, how much maintainance does it require from community to keep in here?

@hmf Sure, please go ahead and create one if you feel it necessary once we have Java API.

So I would like to summarize the topic here:

  1. Go for JavaCPP solution for its better performance. The source code will also be part of the Apache MXNet. In 2.0, we will expect the CI/CD pipeline for MXNet low level Java API.

  2. Go for JNA build pipeline to the community, it can be used out-of-box now without issue. Similarly, the maintainance is very low and less dependencieces required. The source code can also be donated to Apache MXNet.

Both solution are targeted for MXNet low level Java API.

@gigasquid @leezu @szha @zachgk @terrytangyuan @yzhliu Any thoughts?

saudet commented 3 years ago

@saudet if it is a maven package consumption should be fine as long as the license isn't fall under (no license, GPL, LGPL or some license that ASF doesn't approve).

Great! Thanks for the clarification. It's Apache v2, so the license is alright.

I would +1 to the solution you have mentioned in JavaCPP. One last question is the maintainance cost, since JavaCPP is doing the generation work, how much maintainance does it require from community to keep in here?

I've created a branch with a fully functional build that bundles MXNet with wrappers for the C API, on my fork here: https://github.com/saudet/incubator-mxnet/tree/add-javacpp It uses the defaults for CMake, but without CUDA or OpenCV, and I'm guessing it works on Mac and Windows too, but I've only tested on Linux (Fedora), which outputs the following, mapping all declarations of typedef void* to Pointer like you asked:

$ git clone https://github.com/saudet/incubator-mxnet
$ cd incubator-mxnet
$ git checkout add-javacpp
$ cd java
$ gradle clean build --info
...
org.apache.mxnet.internal.c_api.UnitTest > test STANDARD_OUT
    20000
...
BUILD SUCCESSFUL in 1m 3s
10 actionable tasks: 10 executed
...
$ ls -lh build/libs/
total 38M
-rw-rw-r--. 1 saudet saudet 49K Oct  6 20:54 mxnet-2.0-SNAPSHOT.jar
-rw-rw-r--. 1 saudet saudet 38M Oct  6 20:54 mxnet-2.0-SNAPSHOT-linux-x86_64.jar

The number of lines that are directly related to JavaCPP is less than 100, so even if I die anyone can maintain that. I'm sure that's going to grow a bit, but a C API is very easy to maintain. For example, the presets for the C API of TensorFlow 2.x had to be updated only 10 times over the course of the past year: https://github.com/tensorflow/java/blob/master/tensorflow-core/tensorflow-core-api/src/main/java/org/tensorflow/internal/c_api/presets/tensorflow.java

saudet commented 3 years ago

I've pushed changes that show how to use JavaCPP with maven-publish to my fork here: https://github.com/saudet/incubator-mxnet/tree/add-javacpp/java Running gradle publish or something equivalent also deploys an mxnet-platform artifact that can be used this way: https://github.com/bytedeco/javacpp-presets/wiki/Reducing-the-Number-of-Dependencies

For example, with this pom.xml file:

<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.apache</groupId>
    <artifactId>mxnet-sample</artifactId>
    <version>2.0-SNAPSHOT</version>
    <dependencies>
        <dependency>
            <groupId>org.apache</groupId>
            <artifactId>mxnet-platform</artifactId>
            <version>2.0-SNAPSHOT</version>
        </dependency>
    </dependencies>
</project>

We can filter out transitively all artifacts that are not for Linux x86_64 this way:

$ mvn dependency:tree -Djavacpp.platform=linux-x86_64
[INFO] Scanning for projects...
[INFO] 
[INFO] ----------------------< org.apache:mxnet-sample >-----------------------
[INFO] Building mxnet-sample 2.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-dependency-plugin:2.8:tree (default-cli) @ mxnet-sample ---
[INFO] org.apache:mxnet-sample:jar:2.0-SNAPSHOT
[INFO] \- org.apache:mxnet-platform:jar:2.0-SNAPSHOT:compile
[INFO]    +- org.bytedeco:javacpp-platform:jar:1.5.5-SNAPSHOT:compile
[INFO]    |  +- org.bytedeco:javacpp:jar:1.5.5-SNAPSHOT:compile
[INFO]    |  \- org.bytedeco:javacpp:jar:linux-x86_64:1.5.5-SNAPSHOT:compile
[INFO]    +- org.apache:mxnet:jar:2.0-SNAPSHOT:compile
[INFO]    \- org.apache:mxnet:jar:linux-x86_64:2.0-SNAPSHOT:compile
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.360 s
[INFO] Finished at: 2020-10-13T21:19:51+09:00
[INFO] ------------------------------------------------------------------------

And we can do the same with the platform plugin of Gradle JavaCPP: https://github.com/bytedeco/gradle-javacpp#the-platform-plugin

gigasquid commented 3 years ago

As far as my feedback for the two options:

  1. Go for JavaCPP solution for its better performance. The source code will also be part of the Apache MXNet. In 2.0, we will expect the CI/CD pipeline for MXNet low level Java API.

  2. Go for JNA build pipeline to the community, it can be used out-of-box now without issue. Similarly, the maintainance is very low and less dependencieces required. The source code can also be donated to Apache MXNet.

They both sound reasonable and improvements to the system. Thank you both @lanking520 and @saudet for your time and efforts. The one aspect that I haven't heard discussed is that implementation of the base Java API - in particular if anyone is planning on tackling this? If so, the person/s building out the dev work themselves might have a preference that would weight it one way or the other.

saudet commented 3 years ago

Here's another potential benefit of going with a tool like JavaCPP. I've started publishing packages for TVM that bundle its Python API and also wraps its C/C++ API:

Currently, the builds have CUDA/cuDNN, LLVM, MKL, and MKL-DNN/DNNL/oneDNN enabled on Linux, Mac, and Windows, but users do not need to install anything at all--not even CPython! All dependencies get downloaded automatically with Maven (although we can use manually installed ones too if we want). It also works out of the box with GraalVM Native Image and Quarkus this way:

For deployment, the TVM Runtime gets built separately, so it's easy to filter everything and get JAR files that are less than 1 MB, without having to recompile anything at all! It's also easy enough to set up the build in a way to offer a user-friendly interface to generate just the right amount of JNI (in addition to enabling only the backends we are interested in) to get even smaller JAR files. The manually written JNI code currently in TVM's repository doesn't support that. Moreover, it is inefficiently written in a similar fashion to the original JNI code in TensorFlow, see above https://github.com/apache/incubator-mxnet/issues/17783#issuecomment-662994965, so we can assume that using JavaCPP is going to provide a similar boost in performance there as well.

If TVM is eventually integrated in MXNet as per, for example, #15465, this might be worth thinking about right now. For most AI projects, Java is used mainly at deployment time and manually written JNI or automatically generated JNA isn't going to help much in that case.

szha commented 3 years ago

Thanks all for the discussion. @saudet would you help to bootstrap the adoption of javacpp in mxnet to get it off the ground? I'm happy to help facilitate any testing infrastructure work necessary.

saudet commented 3 years ago

@szha Thanks! Could you let me know what would be missing if anything to get this initial contribution into master? https://github.com/saudet/incubator-mxnet/tree/add-javacpp/java Probably a little README.md file would be nice, but other than that?

szha commented 3 years ago

In order for it to be adopted by developers and users, I expect that a new language binding should have the following:

saudet commented 3 years ago

Ok, I'm able to start looking into that.

Well, "language binding", it would basically be just the C API for starters. I think that would be enough for DJL though. @lanking520 @frankfliu Would there be anything specific from your team?

For Jenkins, I assume I'd need to get access to the server and everything to do something with that myself... What about GitHub Actions? I see there is some work going on with those. Are there plans to switch to that?

For the docs, that would be something like the Jenkinsfile_website_java_docs in the v1.x branch? I also see a couple of short Markdown files there for getting started and tutorials, so something like that... using the C API?

szha commented 3 years ago

@saudet for setting up the pipeline, we just need to add a step in existing Jenkinsfiles. I can help facilitate any need for access to the CI.

lanking520 commented 3 years ago

Ok, I'm able to start looking into that.

Well, "language binding", it would basically be just the C API for starters. I think that would be enough for DJL though. @lanking520 @frankfliu Would there be anything specific from your team?

For Jenkins, I assume I'd need to get access to the server and everything to do something with that myself... What about GitHub Actions? I see there is some work going on with those. Are there plans to switch to that?

For the docs, that would be something like the Jenkinsfile_website_java_docs in the v1.x branch? I also see a couple of short Markdown files there for getting started and tutorials, so something like that... using the C API?

I would recommed to provide a basic Java interface that allow all Java developers can build frontend to it. As Sheng mentioned, you can start with the Jenkins template to add a Java publish job to it.

saudet commented 3 years ago

I don't really want to deal with CI, especially Jenkins, it's a major time sink and completely unnecessary with services like GitHub Actions these days, but let's see if I can figure out what needs to be done. If I take the Jenkinsfile_centos_cpu script for Python, it ends up calling functions from here, which basically install environments, runs builds, and executes stuff for Python: https://github.com/apache/incubator-mxnet/blob/master/ci/docker/runtime_functions.sh Is my understanding correct that these scripts are going to need some refactoring to be able to reuse some of that for Java?

If I follow my instincts, I think it's probably going to be easier to look at what's been done for the other minor bindings, such as Julia, but I'm not seeing anything in the Jenkins files for that one: https://github.com/apache/incubator-mxnet/search?q=julia How does that one work?

BTW, there's one thing we've neglected to cover. I was under the impression that MXNet was using Cython to access the C API for its Python binding, but it looks like it's using ctypes. TensorFlow started with SWIG, and now uses pybind11, and the closest Java equivalent for those is JavaCPP, that is they support C++ by generating additional code for bindings at build time, so it makes sense to use JavaCPP in the case of TensorFlow to be able to follow what the core developers are doing for Python.

On the other hand, if MXNet uses ctypes for Python, and has no intention of changing, the closest equivalent in Java land would be JNA. They are both "slow" (partly because of libffi) and support only C APIs, but they can dynamically link at runtime without having to build anything, and I'm assuming that's why there is no CI for Julia, for example. So, is the plan for Python to stick with ctypes? Browsing through https://github.com/apache/incubator-mxnet/issues/17097 I guess that's still not settled? In my opinion, it would make sense to harmonize the strategy of the binding for Java with the one for Python.

leezu commented 3 years ago

BTW, there's one thing we've neglected to cover. I was under the impression that MXNet was using Cython to access the C API for its Python binding, but it looks like it's using ctypes.

MXNet supports both cython and ctypes (fallback) for the Python interface. It depends on your build configuration. https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L91 We may want to change the default for MXNet 2

I don't really want to deal with CI, especially Jenkins, it's a major time sink and completely unnecessary with services like GitHub Actions these days

It's also fine to use Github Actions if that's easier for you. The main reason for using Jenkins is that the MXNet test suite is too large for a free service such as Github Actions and that there are also GPU tests involved. Java tests can initially run on Github Actions and be migrated later to Jenkins based on need.

saudet commented 3 years ago

I've updated my fork with a workflow for Java build on GitHub Actions: https://github.com/saudet/incubator-mxnet/commit/2be05405090e388bdfcb52d40d9419ca7aa038c4 Please let me know what you think of this!

It's currently building and testing for Linux (CentOS 7), Mac, and Windows on x86_64 with and without CUDA: https://github.com/saudet/incubator-mxnet/actions/runs/437312011 (It looks like the build doesn't work for CUDA 11.1 with Visual Studio 2019 yet, but that's unrelated to Java.)

Since my account at Sonatype doesn't have deploy access to org.apache, the artifacts are getting deployed here for now: https://oss.sonatype.org/content/repositories/snapshots/org/bytedeco/mxnet/2.0-SNAPSHOT/

But this can be changed by updating only a single line here: https://github.com/saudet/incubator-mxnet/blob/add-javacpp/java/build.gradle#L8

In any case, the javadoc secondary artifact also gets deployed as part of the build there. Where does the publishing to the main site happen? Somewhere in here by the looks of it: https://github.com/apache/incubator-mxnet/tree/master/ci/publish/website We can fetch the latest javadoc archive this way, so I assume we could add that to the scripts?

$ mvn dependency:get -Dartifact=org.bytedeco:mxnet:2.0-SNAPSHOT:javadoc
$ unzip ~/.m2/repository/org/bytedeco/mxnet/2.0-SNAPSHOT/mxnet-2.0-SNAPSHOT-javadoc.jar -d ...

It's also fine to use Github Actions if that's easier for you. The main reason for using Jenkins is that the MXNet test suite is too large for a free service such as Github Actions and that there are also GPU tests involved. Java tests can initially run on Github Actions and be migrated later to Jenkins based on need.

For that, GitHub Actions now support self-hosted runners, where we just need to provision some machines on the cloud somewhere, and install the equivalent of Jenkins Agent on them, and that's it. Much easier than maintaining Jenkins.

leezu commented 3 years ago

Thank you @saudet. You can take a look at https://infra.apache.org/publishing-maven-artifacts.html for more information on the Apache Software Foundation (ASF) maven artifact publishing process. Summary: Release candidate artifacts are pushed to a staging area and can be promoted after the release vote passed.

One thing to note is that ASF policies do not allow publishing unreleased (nightly) artifacts to the general public. Those should be placed at special location and only used by interested community members. You can take a look at http://www.apache.org/legal/release-policy.html#publication and this FAQ entry http://www.apache.org/legal/release-policy.html#host-rc Do you have any suggestion how to best handle it with your Github Actions script / Maven?

For that, GitHub Actions now support self-hosted runners, where we just need to provision some machines on the cloud somewhere, and install the equivalent of Jenkins Agent on them, and that's it. Much easier than maintaining Jenkins.

Github Actions isn't very mature yet. You can see in the doc that "Self-hosted runners on GitHub do not have guarantees around running in ephemeral clean virtual machines, and can be persistently compromised by untrusted code in a workflow." I don't think that's acceptable for projects accepting contributions from the general public.

leezu commented 3 years ago

I downloaded the mxnet-2.0-20201222.141246-19-linux-x86_64.jar and find that

% ldd org/apache/mxnet/internal/c_api/linux-x86_64/libmxnet.so                                                                                                                                       /tmp/lin
        linux-vdso.so.1 (0x00007fff65fdc000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f46015a3000)
        libgfortran.so.3 => not found
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f4601598000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4601575000)
        libgomp.so.1 => /lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f4601533000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f4601352000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f4601201000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f46011e6000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4600ff4000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f460b892000)

As libgfortran has changed their ABI a few times over the years, you will need to include libgfortran.so in the jar (which we can distribute under AL2 License thanks to the GCC Runtime Library Exception). However, you must not include libquadmath.so (dependency of libgfortran.so) as it is GPL licensed.

For the gpu version mxnet-2.0-20201222.141246-19-linux-x86_64-gpu.jar, would it make sense to use cu110 instead of gpu if built with cuda 11.0 etc?

marcoabreu commented 3 years ago

Regarding security: I think that the quoted paragraph has the same (in)securities as our jenkins setup, doesn't it?

leezu commented 3 years ago

I don't think so. Microsoft specifically says "We recommend that you do not use self-hosted runners with public repositories." It indicates to me that they have very little confidence in their security model. https://docs.github.com/en/free-pro-team@latest/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories

marcoabreu commented 3 years ago

Yes I think they are mentioning the same security problem we are having with our jenkins slaves. Any user could run arbitrary code and install a rootkit. Hence the separation towards restricted slaves.

So from that point of view, I don't consider the github actions self runner any less secure than our jenkins slaves. But of course still insecure.

leezu commented 3 years ago

The problem with runners I had in mind is that there used to be no API to start new instances for each job, but rather that the instances had to be up and running all the time and would be re-used for all jobs. Thus any compromise would be truly persistent. We don't do that in our Jenkins setup, where instances are terminated time-to-time.

But I just checked the Github documentation and Microsoft team has resolved this issue and now provides an API that can provision new runners upon demand. So if there are volunteers, it should be fine to migrate to Github Actions. For example, https://040code.github.io/2020/05/25/scaling-selfhosted-action-runners

saudet commented 3 years ago

Thank you @saudet. You can take a look at https://infra.apache.org/publishing-maven-artifacts.html for more information on the Apache Software Foundation (ASF) maven artifact publishing process. Summary: Release candidate artifacts are pushed to a staging area and can be promoted after the release vote passed.

Thanks for the links! I've been publishing to the Maven Central Repository, I know how that works.

One thing to note is that ASF policies do not allow publishing unreleased (nightly) artifacts to the general public. Those should be placed at special location and only used by interested community members. You can take a look at http://www.apache.org/legal/release-policy.html#publication and this FAQ entry http://www.apache.org/legal/release-policy.html#host-rc Do you have any suggestion how to best handle it with your Github Actions script / Maven?

It doesn't sound to me like they forbid publishing snapshots, just that it shouldn't be documented, which is weird, but whatever. It should be alright to deploy snapshots and keep it a "secret", no? They say we "should" do this and that, but if none of their services offers support for Maven artifacts, I suppose this means we can use something else, right?

As libgfortran has changed their ABI a few times over the years, you will need to include libgfortran.so in the jar (which we can distribute under AL2 License thanks to the GCC Runtime Library Exception). However, you must not include libquadmath.so (dependency of libgfortran.so) as it is GPL licensed.

Yes, that's not a problem. However, if we don't have libquadmath.so, libgfortran.so isn't going to load, so is it still useful?

For the gpu version mxnet-2.0-20201222.141246-19-linux-x86_64-gpu.jar, would it make sense to use cu110 instead of gpu if built with cuda 11.0 etc?

I guess? :) In any case, that's not a problem either. However, it's becoming increasingly irrelevant to try to support multiple versions of CUDA given their accelerating release cycle.

saudet commented 3 years ago

FWIW, it looks to me like libquadmath is LGPL, not GPL: https://github.com/gcc-mirror/gcc/blob/master/libquadmath/COPYING.LIB