eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
936 stars 96 forks source link

Errors with 0.1.2-SNAPSHOT version during runtime and when publishing in local cache (Referring to #87) #91

Closed lucataglia closed 6 years ago

lucataglia commented 6 years ago

This morning I download the 0.1.2-SNAPSHOT version and now when I running my application during the restoring of the Tensor Flow graph (fromMetaGraphDef method) I get this error:

dyld: lazy symbol binding failed: Symbol not found: _TF_TryEvaludyld: lazy symbol binding failedateConstant
  Referenced from: /: Symbol not found: _TF_TryEvaluprivate/var/folders/60/4vxzt5fs3ateConstant
  Referenced from: /md_nx8pq4gw6nph0000gn/T/tensorflprivate/var/folders/60/4vxzt5fs3ow_scala_native_libraries7692440md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries7692440212240027885/libtensorflow_jni.so
  Expected in: /private/var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries7692440212240027885/l212240027885/libtensorflow_jni.sibtensorflow.so

o
  Expected in: /private/var/folders/60/4vxzt5fs3md_nx8pq4gw6npdyld: Symbol not found: _TF_TryEh0000gn/T/tensorflow_scala_nativvaluateConstant
  Referenced froe_libraries7692440212240027885/lm: /private/var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensibtensorflow.so

orflow_scala_native_libraries7692440212240027885/libtensorflow_jni.so
  Expected in: /private/vadyld: Symbol not found: _TF_TryEr/folders/60/4vxzt5fs3md_nx8pq4gvaluateConstant
  Referenced frow6nph0000gn/T/tensorflow_scala_nm: /private/var/folders/60/4vxztative_libraries769244021224002785fs3md_nx8pq4gw6nph0000gn/T/tens85/libtensorflow.so

orflow_scala_native_libraries7692440212240027885/libtensorflow_jni.so
  Expected in: /private/var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries7692440212240027885/libtensorflow.so

dyld: lazy symbol binding failed: Symbol not found: _TF_TryEvaluateConstant
  Referenced from: /private/var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries7692440/usr/local/Cellar/sbt/1.1.0/libexec/bin/sbt-launch-lib.bash: line 58: 36665 Abort trap: 6           "$@"

Then I try to clone the repo and run sbt publishLocal in order to publish the tensorflow_scala project at the commit 7912d779 to make some investigation about the problem above but I got that error:

[error] Undefined symbols for architecture x86_64:
[error]   "_TF_TryEvaluateConstant", referenced from:
[error]       _Java_org_platanios_tensorflow_jni_Op_00024_tryEvaluateConstant in op.cc.o
[error] ld: symbol(s) not found for architecture x86_64
[error] clang: error: linker command failed with exit code 1 (use -v to see invocation)
[error] make[2]: *** [libtensorflow_jni.so] Error 1
[error] make[1]: *** [CMakeFiles/tensorflow_jni.dir/all] Error 2
[error] make: *** [all] Error 2
[error] java.lang.RuntimeException: Failed to build the native library. Exit code: 2.
[error]     at scala.sys.package$.error(package.scala:27)
[error]     at BuildTool$ConfigureMakeInstall$Instance.libraries(BuildTool.scala:61)
[error]     at BuildTool$ConfigureMakeInstall$Instance.libraries$(BuildTool.scala:58)
[error]     at BuildTool$CMake$$anon$3.libraries(BuildTool.scala:113)
[error]     at JniNative$.$anonfun$settings$13(JniNative.scala:92)
[error]     at scala.Function1.$anonfun$compose$1(Function1.scala:44)
[error]     at sbt.internal.util.$tilde$greater.$anonfun$$u2219$1(TypeFunctions.scala:39)
[error]     at sbt.std.Transform$$anon$4.work(System.scala:66)
[error]     at sbt.Execute.$anonfun$submit$2(Execute.scala:262)
[error]     at sbt.internal.util.ErrorHandling$.wideConvert(ErrorHandling.scala:16)
[error]     at sbt.Execute.work(Execute.scala:271)
[error]     at sbt.Execute.$anonfun$submit$1(Execute.scala:262)
[error]     at sbt.ConcurrentRestrictions$$anon$4.$anonfun$submitValid$1(ConcurrentRestrictions.scala:174)
[error]     at sbt.CompletionService$$anon$2.call(CompletionService.scala:36)
[error]     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[error]     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[error]     at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[error]     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[error]     at java.lang.Thread.run(Thread.java:748)
[error] (jni / nativeCompile) Failed to build the native library. Exit code: 2.
[error] Total time: 69 s, completed Mar 21, 2018 2:42:34 PM
eaplatanios commented 6 years ago

That's because of the version of the TensorFlow native library that you're using. The code is currently using the master branch of the TensorFlow repository. This will be compatible with version 1.7.0 of TensorFlow. You can either compile TensorFlow from sources (using the master branch) to obtain the libtensorflow.so and libtensorflow_framework.so libraries, or download and extract them from here (for Linux CPU) or here (for Linux GPU).

lucataglia commented 6 years ago

@eaplatanios Ok, so just for to be sure, I can not anymore relies on the pre-compile binaries that you offer but I have to compile the binaries by my self, right ? Anyway thank you for the quickly response :)

eaplatanios commented 6 years ago

@lucaRadicalbit I'm waiting for the official 1.7.0 release and I'll then update the precompiled binaries too (I might even do so earlier if I find some time). :)

Let me know if all works fine for you when you compile the binaries yourself.

And no worries! I'm glad you're using this library and report bugs. :)

lucataglia commented 6 years ago

@eaplatanios Thank you for your patient :) Now I try to:

  1. Clean all the .ivy/cache and .ivy/local folders
  2. Download your repo in order to publishLocal your project
  3. Modify the version name of your project
  4. Modify the namespace of your project
  5. sbt +publishLocal
  6. Going to my project and update the line that describe the dependencies in order to refer to the local publish I done above (new version and new namespace)
  7. Try to run the application

But I got this error:

#
#  SIGSEGV (0xb)[thread 49923 also had an error] at pc=0x000000011f6ea422
, pid=60487, tid=0x000000000000c503
#
# JRE version: Java(TM) SE Runtime Environment (8.0_151-b12) (build 1.8.0_151-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.151-b12 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtensorflow_framework.so+0x178422]  _ZN10tensorflow6Status12SlowCopyFromEPKNS0_5StateE+0x22
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# hs_err_pid60487.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
/usr/local/Cellar/sbt/1.1.0/libexec/bin/sbt-launch-lib.bash: line 58: 60487 Abort trap: 6           "$@"/usr/local/Cellar/sbt/1.1.0/libexec/bin/sbt-launch-lib.bash: line 58: 60487 Abort trap: 6           "$@"
eaplatanios commented 6 years ago

@lucaRadicalbit This should be fixed in 45c861682907d536670a2e77a534528485ba1573. Please reopen this issue if it persists. :)

By the way, you should reset your Ivy cache again in order to pull the newly published artifacts.