deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.07k stars 648 forks source link

fail to download native engine lib (libtouch_cpu.so.gz and so on) #1117

Closed renshuo closed 3 years ago

renshuo commented 3 years ago

I meet this exception because of the poor net connection:

10:08:38.845 [main] INFO ai.djl.pytorch.jni.LibUtils - Downloading https://publish.djl.ai/pytorch-1.8.1/cpu/linux/native/lib/libtorch_cpu.so.gz ... Exception in thread "main" ai.djl.engine.EngineException: Failed to load PyTorch native library at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:77) at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40) at ai.djl.engine.Engine.getEngine(Engine.java:152) at ai.djl.engine.Engine.getInstance(Engine.java:117) at DJLTest$package$.DJLTest(DJLTest.scala:27) at DJLTest.main(DJLTest.scala:25) Caused by: java.lang.IllegalStateException: Failed to download PyTorch native library at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:283) at ai.djl.pytorch.jni.LibUtils.getLibName(LibUtils.java:84) at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:72) at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:50) ... 5 more Caused by: java.io.EOFException: Unexpected end of ZLIB input stream at java.base/java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:244) at java.base/java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) at java.base/java.util.zip.GZIPInputStream.read(GZIPInputStream.java:117) at java.base/java.io.InputStream.transferTo(InputStream.java:782) at java.base/java.nio.file.Files.copy(Files.java:3155) at ai.djl.pytorch.jni.LibUtils.downloadPyTorch(LibUtils.java:398) at ai.djl.pytorch.jni.LibUtils.findNativeLibrary(LibUtils.java:281) ... 8 more

So, why not use sbt or gradle to resolve the dependence?

Here is a example with OS detect.

libraryDependencies += "org.scalafx" % "scalafx3" % "16.0.0-R24" libraryDependencies ++= { lazy val osName = System.getProperty("os.name") match { case n if n.startsWith("Linux") => "linux" case n if n.startsWith("Mac") => "mac" case n if n.startsWith("Windows") => "win" case => throw new Exception("Unknown platform!") } Seq("base", "controls", "fxml", "graphics", "media", "swing", "web") .map(m => "org.openjfx" % s"javafx-$m" % "16" classifier osName) }`

I think djl can use same method to resolve engine dependence for better user experience.

frankfliu commented 3 years ago

We provide both auto detection jar and platform specific jar.

It's up to you to choose which which classifier in your gradle or sbt file. We are doing exactly the same as you describe in some of our own project: https://github.com/deepjavalibrary/djl-demo/blob/master/canary/build.gradle#L67-L73

We only download the libtorch.so file once, once it cached, we won't download it again. If your network is broken, you will have the same problem to download platform specific jar file.

However this may not work for many use cases:

  1. It's hard to detect cuda versions in gradle, your suggested solution only works for CPU case. Our djl-bench project, we support all engines and all cuda versions in the same project, the total package size are more than 30G (linux only, another 30G for windows), we don't want every user download packages that they don't need.
  2. The gradle solution doesn't work if you want to distribute your application, your distribution package only contains native library for your build machine, it's cannot be distributed to other platform.
renshuo commented 3 years ago

Get it, thanks for your reply. I get more knowledge and experience.