deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.17k stars 663 forks source link

Tensorflow Engine not found (failed to load from TFEngineProvider) #139

Closed humzaiqbal closed 3 years ago

humzaiqbal commented 4 years ago

Description

(A clear and concise description of what the bug is.) When trying to load a Tensorflow hub model into djl I get an engine not found exception. I'm running in Intelij and made sure to add the following to my pom.xml so that the tensorflow dependencies are there.

        <dependency>
            <groupId>ai.djl.tensorflow</groupId>
            <artifactId>tensorflow-engine</artifactId>
            <version>0.6.0</version>
        </dependency>

        <dependency>
            <groupId>ai.djl.tensorflow</groupId>
            <artifactId>tensorflow-native-auto</artifactId>
            <version>2.2.0</version>
        </dependency>

Interestingly, I don't have this problem when I use PyTorch, I am able to load and run PyTorch models just fine when the PyTorch equivalent dependencies are added

Expected Behavior

The code should run and load the model without failure

Error Message

Exception in thread "main" ai.djl.engine.EngineException: No deep learning engine found.
Please refer to https://github.com/awslabs/djl/blob/master/docs/development/troubleshooting.md for more details.
    at ai.djl.engine.Engine.getInstance(Engine.java:95)
    at ai.djl.repository.zoo.DefaultModelZoo.getSupportedEngines(DefaultModelZoo.java:71)
    at ai.djl.repository.zoo.ModelZoo.loadModel(ModelZoo.java:131)
    at ai.securiti.privaci.detection.analyzer.topic.javatopicclassifier.SimpleDummyTest.<init>(SimpleDummyTest.java:42)
    at ai.securiti.privaci.detection.analyzer.topic.javatopicclassifier.SimpleDummyTest.main(SimpleDummyTest.java:53)

How to Reproduce?

(If you developed your own code, please provide a short script that reproduces the error. For existing examples, please provide link.)

I'm attaching a zip file containing a Java file SimpleDummyTest.java running that should reproduce it SimpleDummyTest.java.zip

Steps to reproduce

(Paste the commands you ran that produced the error.)

What have you tried to solve it?

  1. I used a debugger to dive into the error and what I ultamitely found was the following: When it got to Engine.class it would try to load the engine by doing Engine engine = provider.getEngine(); and then that failed causing it to log
    Failed to load engine from: ai.djl.tensorflow.engine.TfEngineProvider

Environment Info

Please run the command ./gradlew debugEnv from the root directory of DJL (if necessary, clone DJL first). It will output information about your system, environment, and installation that can help us debug your issue. Paste the output of the command below: I didn't run with gradlew instead ran on Intelij will manually paste all relevant info


JDK: 11.0.1
IntelliJ IDEA 2018.3.3 (Community Edition)
Build #IC-183.5153.38, built on January 9, 2019
JRE: 1.8.0_152-release-1343-b26 x86_64
JVM: OpenJDK 64-Bit Server VM by JetBrains s.r.o
OS: macOS 10.13.6
humzaiqbal commented 4 years ago

Update: After debugging some more I got to a point in the TFEngine.class where the code was instantiating an instance of the TFEngine by calling the newInstance() method and when it was running EagerSession.getDefault() I got this error

java.lang.ClassNotFoundException: org.tensorflow.internal.c_api

Does this mean that the org.tensorflow library is required in order to use tensorflow in djl? My understanding was that as long as I include the pom dependencies I should be able to run it just fine (this was the case when I ran djl with PyTorch although I did have PyTorch installed on my machine)

roywei commented 4 years ago

Hi @humzaiqbal, thanks for raising the issue! There are a few things to try:

  1. Add "ai.djl.tensorflow:tensorflow-api:0.6.0" as well, although it should be automatically included as tensorflow-engine depends on it. the dependency is ai.djl.tensorflow:tensorflow-engine ->ai.djl.tensorflow:tensorflow-api -> org.tensorflow:tensorflow-core-api

  2. If you are using IDE, sometimes the service provider class is not recognized, try rebuild or recompile the provider class. See https://docs.djl.ai/master/docs/development/troubleshooting.html

  3. see if the native tensorflow libraries are actually downloaded, it should be under home directory, eg: ~/.tensorflow/cache/2.2.0-SNAPSHOT-cpu-osx-x86_64/

  4. When you debug in IDE mode, you may need to specify in VM options, -Djava.library.path=/path/to/libtensorflow.so/file, the path is same as step 3. Note, you don't need this when using gradle to run.

  5. Here are some examples on how to load TF models:

https://github.com/aws-samples/djl-demo/tree/master/pneumonia-detection https://github.com/awslabs/djl/tree/master/tensorflow/tensorflow-model-zoo/src/test/java/ai/djl/tensorflow/integration/modality/cv

The SSD Test is using a model from TF Hub. All models are in saved model bundle format.

roywei commented 4 years ago

Update: I was able to run your file with the same setting you described. Here is my pom.xml file and SimpleDummyTest.java

SimpleDummyTest.zip

I was able to load the model, but it seems to have some internal error when loading. Note by inspecting this model, you need to specify the tags to be [] during loading.

            Criteria<NDList, NDList> criteria =
                    Criteria.builder()
                            .setTypes(NDList.class, NDList.class)
                            .optArtifactId("ai.djl.localmodelzoo:encoder")
                            .optOption("Tags", new String[]{})
                            .optTranslator(translator)
                            .build();

Output:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
2020-07-22 14:47:03.484403: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-07-22 14:47:03.493307: I external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2020-07-22 14:47:03.513126: I external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2020-07-22 14:47:03.579938: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /Users/lawei/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/10f287ae118acb18b9ad367e6ce28b61
2020-07-22 14:47:03.713259: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags {  }
2020-07-22 14:47:03.713301: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:295] Reading SavedModel debug info (if present) from: /Users/lawei/.djl.ai/cache/repo/model/undefined/ai/djl/localmodelzoo/10f287ae118acb18b9ad367e6ce28b61
2020-07-22 14:47:03.722141: I external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
2020-07-22 14:47:03.979077: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:234] Restoring SavedModel bundle.
2020-07-22 14:47:04.185743: E external/org_tensorflow/tensorflow/core/grappler/optimizers/meta_optimizer.cc:563] model_pruner failed: Invalid argument: Invalid input graph.
2020-07-22 14:47:09.509059: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:364] SavedModel load for tags {  }; Status: fail: Invalid argument: Tensor :0, specified in either feed_devices or fetch_devices was not found in the Graph. Took 5929119 microseconds.
Exception in thread "main" org.tensorflow.exceptions.TFInvalidArgumentException: Tensor :0, specified in either feed_devices or fetch_devices was not found in the Graph
    at org.tensorflow.internal.c_api.AbstractTF_Status.throwExceptionIfNotOK(AbstractTF_Status.java:87)
humzaiqbal commented 4 years ago

Thanks much for your help @roywei I followed some of your advice and added the extra dependency to my pom file and I added the directory to my VM options ~/.tensorflow/cache/2.2.0-SNAPSHOT-cpu-osx-x86_64/. When I looked at the directory I saw that the following files were present

LICENSE             libgcc_s.1.dylib        libiomp5.dylib          libjnimkldnn.dylib      libjnitensorflow.dylib      libmklml.dylib          libtensorflow.2.dylib
THIRD_PARTY_TF_JNI_LICENSES libgomp.1.dylib         libjnijavacpp.dylib     libjnimklml.dylib       libmkldnn.0.dylib       libstdc++.6.dylib       libtensorflow_framework.2.dylib

I notice there is no libtensorflow.so file, and now when I debug I get an UnsatisfiedLinkError when trying to load the TF Engine so I guessI need to look at that

frankfliu commented 4 years ago

@humzaiqbal The files in your cache folder seems correct. libtensorflow.2.dylib is equivalent to libtensorflow.so on Mac. The file might corrupted. Would you please try to clean the cache folder, and run following command:

cd djl
./gradlew debugEnv -Dai.djl.default_engine=TensorFlow

It should print debug information regard tensorflow engine loading.

roywei commented 4 years ago

Hi @humzaiqbal , did you try out my uploaded files in the previous comments? It works out of the box even without setting dependency and the java.library.path, try remove ~/.tensorflow and ~/.djl.ai to avoid corrupted files.

Also the model you provides seems to be a TF1 model, you need to specify .optOption("Tags", new String[]{}) during loading.

You can try this TF2 sentence encoder: https://tfhub.dev/google/universal-sentence-encoder/4, I can load it without any problem. (no need for optOption)

Next step is for prediction, this model requires a String Tensor as input. We will look into how to support that in DJL NDArray API. (Currently Pytorch and MXNet engine does not support String Tensors)

humzaiqbal commented 4 years ago

Thanks much for your suggestions.

@frankfliu unfortunately I'm not using gradle so that won't work for me.

@roywei I tried removing my directory and restarting but that didn't work. either. Also I can't use the TF2 option because I need USE LITE specifically for my application. If a String Tensor is required then perhaps I might be better off going to the Google Java binding for Tensorflow. I am able to load the model and feed in input tensors there currently, but if that becomes possible with djl please let me know because I am a fan of the library (made it quite easy for me to load in my PyTorch model when I was trying that some time ago :) )

frankfliu commented 4 years ago

@humzaiqbal You don't need install anything to use gradle. the gradle command work out of box in our git repo. By the way, we are using TF java binding. If TF-java can work, DJL should just work.

humzaiqbal commented 4 years ago

Ah I wasn't using the git repo for djl I just installed the maven dependencies and didn't realize you had the gradle bash script so thats why I figured I had to have something else installed

frankfliu commented 4 years ago

@humzaiqbal Do you still have issue on your end?

roywei commented 4 years ago

@humzaiqbal we have added Universal Encoder Example here could you take a look if that works for you?

stu1130 commented 3 years ago

@humzaiqbal I am going to close the stale issue. If you have any other question, feel free to reopen it