eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
936 stars 96 forks source link

Problem with TF Eager ops: Add Input #112

Closed mandar2812 closed 6 years ago

mandar2812 commented 6 years ago

After switching from tf-scala 0.1.1 to 0.2.0, I have the following native code error occurring, in a non-deterministic manner.

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000000013236578d, pid=18936, tid=0x0000000000002803
#
# JRE version: Java(TM) SE Runtime Environment (8.0_101-b13) (build 1.8.0_101-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.101-b13 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C  [libtensorflow.so+0x3f78d]  tensorflow::EagerOperation::AddInput(tensorflow::TensorHandle*)+0xd
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/mandar/Development/PlasmaML/hs_err_pid18936.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Abort trap: 6

Seems like there is some problem thrown by the AddInput function, what could be causing this?

My guess is code like

val input  = tf.learn.Input(FLOAT64, Shape(-1, tf_dataset.trainData.shape(1)))
eaplatanios commented 6 years ago

@mandar2812 Which version of the TensorFlow shared libraries are you using? Could you try using the pre-compiled ones distributed with the Scala artifacts?

mandar2812 commented 6 years ago

@eaplatanios I am using the pre-compiled tf binaries that come with tf-scala. Are there any caches/tmp directories I should clean?

eaplatanios commented 6 years ago

@mandar2812 Could you please try using version 0.2.1. There was an issue that should hopefully have been fixed by now.

mandar2812 commented 6 years ago

@eaplatanios I cant seem to find the tf-scala 2.11 artefacts for version 0.2.1. Did you upload them?

mandar2812 commented 6 years ago

So this seems to be an issue caused due to replication/caching of tensorflow-jni across branches. Lets close this for now.