deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.05k stars 648 forks source link

Failed to download libraries #2247

Open waicool20 opened 1 year ago

waicool20 commented 1 year ago

Description

Some files are failing to download after updating pytorch-engine to 0.20.0, the files aren't on your cloud instances so DJL just throws an error

Expected Behavior

Downloads properly

Error Message

ai.djl.engine.EngineException: Cannot download jni files: https://publish.djl.ai/pytorch/1.9.1/jnilib/0.20.0/linux-x86_64/cu111/libdjl_torch.so
    at ai.djl.pytorch.jni.LibUtils.downloadJniLib(LibUtils.java:515)
    at ai.djl.pytorch.jni.LibUtils.findJniLibrary(LibUtils.java:252)
    at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:80)
    at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:54)
    at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:40)
    at ai.djl.engine.Engine.getEngine(Engine.java:186)
    at ai.djl.engine.Engine.getInstance(Engine.java:141)
Caused by: java.io.FileNotFoundException: https://publish.djl.ai/pytorch/1.9.1/jnilib/0.20.0/linux-x86_64/cu111/libdjl_torch.so
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1993)
    at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
    at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
    at java.base/java.net.URL.openStream(URL.java:1161)
    at ai.djl.util.Utils.openUrl(Utils.java:459)
    at ai.djl.util.Utils.openUrl(Utils.java:443)
    at ai.djl.pytorch.jni.LibUtils.downloadJniLib(LibUtils.java:509)
    ... 12 more

How to Reproduce?

I've changed a simple application from

    implementation("ai.djl.pytorch:pytorch-engine:0.16.0")
    implementation("ai.djl.pytorch:pytorch-native-auto:1.9.1")

to

    implementation("ai.djl.pytorch:pytorch-engine:0.20.0")
    implementation("ai.djl.pytorch:pytorch-native-auto:1.9.1")

Steps to reproduce

Just launch a simple program with this line to initiate the process to load the native libraries

        Engine.getInstance()

What have you tried to solve it?

These are missing files on your servers I assume, so nothing can be really done other than rollback...

Environment Info

N/A

frankfliu commented 1 year ago

@waicool20

  1. ai.djl.pytorch:pytorch-native-auto is no longer needed, simply remove it will work
  2. PyTorch 1.9.1 is not supported by 0.20.0, 0.20.0 support 1.11.0, 1.12.1 and 1.13.0, see: https://docs.djl.ai/master/engines/pytorch/pytorch-engine/index.html
waicool20 commented 1 year ago

Seems like that works, the GPU inference is fine, but when i force it to use cpu by adding to gradle:

    implementation("ai.djl.pytorch:pytorch-native-cpu:1.13.0:linux-x86_64")

it hangs up with a very non-descript error:

Program aborted due to an unhandled Error:
Unable to find target for this triple (no targets are registered)
frankfliu commented 1 year ago

The error seems related to your jit traced model with PyTorch 1.13.0: https://discuss.pytorch.org/t/calling-forward-on-torchscript-model-multiple-times-leads-to-error/154990/3

Can you try PyTorch 1.12.1?

waicool20 commented 1 year ago

1.12.1 does not work, neither does 1.11.0

That link indicates it fails on multiple forwards, but this happens on the first forward/predict call

frankfliu commented 1 year ago

Can you try it with python:

python3 -m pip install torch==1.13.0+cpu -f https://download.pytorch.org/whl/torch_stable.html