deepjavalibrary / djl

An Engine-Agnostic Deep Learning Framework in Java
https://djl.ai
Apache License 2.0
4.07k stars 648 forks source link

UniversalSentenceEncoder embeddings don't change regardless of input #1056

Closed markhng525 closed 3 years ago

markhng525 commented 3 years ago

Description

When I try to test out the UniversalSentenceEncoder based upon the example class ./examples/src/main/java/ai/djl/examples/inference/UniversalSentenceEncoder.java

Using a different set of inputs, the encoded embeddings remain unchanged float[][] embeddings1 = UniversalSentenceEncoder.predict(inputs1).

Expected Behavior

I expect the embeddings and embeddings1 values to be different

How to Reproduce?

    public static void main(String[] args) throws IOException, ModelException, TranslateException {
        List<String> inputs = new ArrayList<>();
        inputs.add("The quick brown fox jumps over the lazy dog.");
        inputs.add("I am a sentence for which I would like to get its embedding");

        float[][] embeddings = UniversalSentenceEncoder.predict(inputs);

        List<String> inputs1 = new ArrayList<>();
        inputs1.add("There is a stray dog over there.");
        inputs1.add("I should see a new type of encoded output");

        float[][] embeddings1 = UniversalSentenceEncoder.predict(inputs1);

        if (embeddings == null | embeddings1 == null) {
            logger.info("This example only works for TensorFlow Engine");
        } else {
            for (int i = 0; i < inputs.size(); i++) {
                logger.info(
                        "Embedding for: " + inputs.get(i) + "\n" + Arrays.toString(embeddings[i]));
                logger.info(
                        "Embedding for: " + inputs1.get(i) + "\n" + Arrays.toString(embeddings1[i]));
            }
        }
    }

Steps to reproduce

  1. Simply run UniversalSentenceEncoder#main in IDE

What have you tried to solve it?

  1. Tried loading universal-sentence-encoder model from local filesystem downloaded from google
  2. Created 2 different model instances in case Predictor<String[], float[][]> predictor = model.newPredictor() was caching something underneath the hood.

Environment Information

Please run the command ./gradlew debugEnv from the root directory of DJL (if necessary, clone DJL first). It will output information about your system, environment, and installation that can help us debug your issue. Paste the output of the command below:

--------- Environment Variables ---------
PATH: /Users/markh/.asdf/shims:/usr/local/opt/asdf/bin:/usr/local/bin:/usr/local/sbin:/opt/homebrew/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Users/markh/.pyenv/bin:/Users/markh/.local/lib/python3.7/site-packages/:/Applications/Julia-0.5.app/Contents/Resources/julia/bin/:/Users/markh/bin:/usr/local/bin:/Users/markh/.poetry/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/Library/TeX/texbin:/usr/local/share/dotnet:/opt/X11/bin:~/.dotnet/tools:/Library/Apple/usr/bin:/Library/Frameworks/Mono.framework/Versions/Current/Commands:/Users/markh/.local/bin:/Applications/Visual Studio Code.app/Contents/Resources/app/bin
ASDF_DIR: /usr/local/opt/asdf
MANPATH: /usr/local/share/man::
JAVA_MAIN_CLASS_12998: ai.djl.integration.util.DebugEnvironment
SDKROOT: /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
TERM: xterm-256color
HOMEBREW_PREFIX: /usr/local
LDFLAGS: -L/usr/local/opt/zlib/lib -L/usr/local/opt/bzip2/lib
__INTELLIJ_COMMAND_HISTFILE__: /Users/markh/Library/Caches/JetBrains/IntelliJIdea2021.1/terminal/history/djl-history
COMMAND_MODE: unix2003
DISPLAY: /private/tmp/com.apple.launchd.YKZbfJvMdQ/org.macosforge.xquartz:0
APP_NAME_12947: Gradle
LOGNAME: markh
HOMEBREW_REPOSITORY: /usr/local/Homebrew
PWD: /Users/markh/github/djl
XPC_SERVICE_NAME: 0
INFOPATH: /usr/local/share/info:
__CFBundleIdentifier: com.jetbrains.intellij
SHELL: /bin/zsh
CPPFLAGS: -I/usr/local/opt/zlib/include -I/usr/local/opt/bzip2/include
PAGER: less
LSCOLORS: Gxfxcxdxbxegedabagacad
JAVA_MAIN_CLASS_12947: org.gradle.wrapper.GradleWrapperMain
OLDPWD: /Users/markh/github/djl
HOMEBREW_CELLAR: /usr/local/Cellar
USER: markh
ZSH: /Users/markh/.oh-my-zsh
LOGIN_SHELL: 1
TERMINAL_EMULATOR: JetBrains-JediTerm
TMPDIR: /var/folders/zj/25nx_j7s19j3_zb5fx_rghnh0000gn/T/
SSH_AUTH_SOCK: /private/tmp/com.apple.launchd.1XDkOL7m78/Listeners
XPC_FLAGS: 0x0
TERM_SESSION_ID: 5b4d5fd9-ba06-4e93-beda-5712cb231ad4
APP_ICON_12947: /Users/markh/github/djl/media/gradle.icns
__CF_USER_TEXT_ENCODING: 0x1F5:0x0:0x0
LESS: -R
LC_CTYPE: en_US.UTF-8
SHLVL: 1
HOME: /Users/markh

-------------- Directories --------------
temp directory: /var/folders/zj/25nx_j7s19j3_zb5fx_rghnh0000gn/T
DJL cache directory: /Users/markh/.djl.ai
Engine cache directory: /Users/markh/.djl.ai

------------------ CUDA -----------------
[DEBUG] - cudart library not found.
[DEBUG] - Using cache dir: /Users/markh/.djl.ai/mxnet
[DEBUG] - Loading mxnet library from: /Users/markh/.djl.ai/mxnet/1.8.0-mkl-osx-x86_64/libmxnet.dylib
GPU Count: 0
Default Device: cpu()

----------------- Engines ---------------
Default Engine: MXNet
PyTorch: 2
[DEBUG] - Using cache dir: /Users/markh/.djl.ai/pytorch
[INFO ] - Downloading https://publish.djl.ai/pytorch-1.9.0/cpu/osx/native/lib/libtorch_cpu.dylib.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch-1.9.0/cpu/osx/native/lib/libiomp5.dylib.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch-1.9.0/cpu/osx/native/lib/libtorch.dylib.gz ...
[INFO ] - Downloading https://publish.djl.ai/pytorch-1.9.0/cpu/osx/native/lib/libc10.dylib.gz ...
[DEBUG] - Loading pytorch library from: /Users/markh/.djl.ai/pytorch/1.9.0-SNAPSHOT-20210616-cpu-osx-x86_64/0.12.0-SNAPSHOT-cpu-libdjl_torch.dylib
[INFO ] - Number of inter-op threads is 4
[INFO ] - Number of intra-op threads is 4
MXNet: 0
TensorFlow: 3
[DEBUG] - Using cache dir: /Users/markh/.djl.ai/tensorflow
[DEBUG] - Loading TensorFlow library from: /Users/markh/.djl.ai/tensorflow/2.4.1-cpu-osx-x86_64/libjnitensorflow.dylib
Warning: Could not load Loader: java.lang.UnsatisfiedLinkError: no jnijavacpp in java.library.path: [/Users/markh/Library/Java/Extensions, /Library/Java/Extensions, /Network/Library/Java/Extensions, /System/Library/Java/Extensions, /usr/lib/java, .]
2021-06-27 11:22:28.775685: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

--------------- Hardware --------------
Available processors (cores): 8
Byte Order: LITTLE_ENDIAN
Free memory (bytes): 180037888
Maximum memory (bytes): 4294967296
Total memory available to JVM (bytes): 270532608
Heap committed: 270532608
Heap nonCommitted: 64557056
GCC: 
Apple clang version 12.0.5 (clang-1205.0.22.9)
Target: x86_64-apple-darwin20.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
frankfliu commented 3 years ago

Looks like the string tensor support I added has issue. We don't have is problem in v0.10.0.