deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.14k stars 658 forks source link

Pytorch runtime not included when project is built as a jar #2395

Closed HishamGarout closed 1 year ago

HishamGarout commented 1 year ago

Description

We have our ONNX model, and we're trying to use it in our app. Running the project from the IDE works just fine, but running the jar file produces this error. We debugged this issue and found that the supportedEngines in IDE runtime includes ONNXRuntime and PyTorch, but the JAR file includes ONNXRuntime only. The issue happens when we reach StackBatchifier.batchify(), on String inputName = ((NDArray)inputs[0].get(i)).getName();

We have the following dependencies in our gradle file: api("org.apache.logging.log4j:log4j-slf4j-impl:2.18.0") api("ai.djl:model-zoo:0.21.0-SNAPSHOT") api("ai.djl.huggingface:tokenizers:0.21.0-SNAPSHOT") api("ai.djl.pytorch:pytorch-model-zoo:0.21.0-SNAPSHOT") api("ai.djl.onnxruntime:onnxruntime-engine:0.19.0") api("org.jetbrains.kotlin:kotlin-stdlib:1.7.20")

Do we need to put any jar configurations for it to work?

Expected Behavior

Pytorch should be included in the supported engines

Error Message

Exception in thread "main" ai.djl.translate.TranslateException: java.lang.UnsupportedOperationException: This NDArray implementation does not currently support this operation at ai.djl.inference.Predictor.batchPredict(Predictor.java:189) at ai.djl.inference.Predictor.predict(Predictor.java:126) at ProfanityPredictionModel.predict(ProfanityPredictionModel.kt:30) at TestModel.main(TestModel.kt:18) Caused by: java.lang.UnsupportedOperationException: This NDArray implementation does not currently support this operation at ai.djl.ndarray.NDArrayAdapter.getAlternativeArray(NDArrayAdapter.java:1225) at ai.djl.ndarray.NDArrayAdapter.getNDArrayInternal(NDArrayAdapter.java:1173) at ai.djl.ndarray.NDArrays.stack(NDArrays.java:1825) at ai.djl.ndarray.NDArrays.stack(NDArrays.java:1785) at ai.djl.translate.StackBatchifier.batchify(StackBatchifier.java:52) at ai.djl.inference.Predictor.processInputs(Predictor.java:217) at ai.djl.inference.Predictor.batchPredict(Predictor.java:177) ... 3 more

frankfliu commented 1 year ago

How you package your jar file? Are you using fat jar? Do you have network access when you run your app?

See our packaging demo: https://github.com/deepjavalibrary/djl-demo/tree/master/development/fatjar

HishamGarout commented 1 year ago

How you package your jar file? Are you using fat jar? Do you have network access when you run your app?

See our packaging demo: https://github.com/deepjavalibrary/djl-demo/tree/master/development/fatjar

Tried many configurations, including fat jar, with no success. And we do have internet connection.

We noticed one thing while debugging, if we change the order of the libraries in the jar build, the first one is the only one that is loaded. So when pytorch engine is above onnxruntime engine, only pytorch engine is loaded and we get this error: Exception in thread "main" ai.djl.repository.zoo.ModelNotFoundException: ModelZoo doesn't support specified engine: OnnxRuntime at ai.djl.repository.zoo.Criteria.loadModel(Criteria.java:121) at ProfanityPredictionModel.loadModel(ProfanityPredictionModel.kt:24) at ProfanityPredictionModel.predict(ProfanityPredictionModel.kt:29) at TestModel.main(TestModel.kt:18)

frankfliu commented 1 year ago

@HishamGarout

You might hit this issue #940, DJL use ServiceLoader, you need merge service provider resource file when creating fatjar.

Please pay attention to our fatjar's pom.xml is shade plugin section, or gradle's shadowJar plugin

github-actions[bot] commented 1 year ago

This issue has been automatically closed because there has been no response to our request for more information from the original author. With only the information that is currently in the issue, we don't have enough information to take action. Please reach out if you have or find the answers we need so that we can investigate further.