eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
937 stars 95 forks source link

Segfault when using pre-compiled TensorFlow dynamic library #8

Closed sbrunk closed 7 years ago

sbrunk commented 7 years ago

When I try to use the pre-compiled libtensorflow 1.2.0-rc0, I'm getting a segfault. Apparently caused by the call to getAttrBool.

It works fine when I compile libtensorflow.so myself (tried cpu version with default settings from the submodule commit eb11d6b, although newer versions seem to work as well).

Any idea what might be causing this?

System: Ubuntu 17.04 (64 Bit, OpenJDK 8)

JVM error message:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f994da88db4, pid=26404, tid=0x00007f997b463700
#
# JRE version: OpenJDK Runtime Environment (8.0_131-b11) (build 1.8.0_131-8u131-b11-0ubuntu1.17.04.1-b11)
# Java VM: OpenJDK 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libtensorflow.so+0x1a35db4]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x00007f997c016000):  JavaThread "run-main-0" [_thread_in_native, id=26840, stack(0x00007f997b363000,0x00007f997b464000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000001d

Registers:
RAX=0x00000000ffffffff, RBX=0x0000000000000000, RCX=0x0000000000000074, RDX=0x000000000000001d
RSP=0x00007f997b4608f0, RBP=0x00007f997b460920, RSI=0x0000000000000000, RDI=0x000000000000000d
R8 =0x00007f997b4608f8, R9 =0x0000000000000008, R10=0x0000000000000008, R11=0x00007f9a14a5b790
R12=0x00007f9946456940, R13=0x00007f994662cab0, R14=0x00007f994662cab0, R15=0x000000000000000b
RIP=0x00007f994da88db4, EFLAGS=0x0000000000010286, CSGSFS=0x002b000000000033, ERR=0x0000000000000006
  TRAPNO=0x000000000000000e

Top of Stack: (sp=0x00007f997b4608f0)
0x00007f997b4608f0:   0000000000000000 00007f994649ca18
0x00007f997b460900:   00007f997b460920 0000000000000000
0x00007f997b460910:   00007f997b460950 00007f994662cab0
0x00007f997b460920:   00007f997b460990 00007f994c2ea188
0x00007f997b460930:   0000000000000008 00007f9964000020
0x00007f997b460940:   0000000000000000 00007f994662cab0
0x00007f997b460950:   00007f994649c7c8 00007f994649c8d0
0x00007f997b460960:   00007f99438162e0 00007f994662cab0
0x00007f997b460970:   00007f997b460aa0 00007f994662c9a0
0x00007f997b460980:   00007f994662cab0 00007f997c0161e0
0x00007f997b460990:   00007f99438162e0 00007f994e5effa0
0x00007f997b4609a0:   0000000000000000 00000003a5585b00
0x00007f997b4609b0:   00007f997b460a00 ffffffffffffffff
0x00007f997b4609c0:   0000000700000003 ffffffffffffffff
0x00007f997b4609d0:   00000007c1218e98 94b3abdba5585b00
0x00007f997b4609e0:   00007f997b460a80 00007f993c4752c8
0x00007f997b4609f0:   00007f997b460a80 0000000000000000
0x00007f997b460a00:   00007f993c4752b8 00007f997b460ab8
0x00007f997b460a10:   00007f997c016000 00007f99fd017774
0x00007f997b460a20:   00007f997b460a40 00007f9a13ccd8ff
0x00007f997b460a30:   00007f993c3ecec8 00007f997b460a80
0x00007f997b460a40:   00007f997b460a40 0000000000000000
0x00007f997b460a50:   00007f997b460ab8 00007f993c4762e8
0x00007f997b460a60:   0000000000000000 00007f993c4752c8
0x00007f997b460a70:   0000000000000000 00007f997b460aa0
0x00007f997b460a80:   00007f997b460b00 00007f99fd0077d0
0x00007f997b460a90:   0000000000000000 00007f99fd011260
0x00007f997b460aa0:   00000006ad01d9f8 00007f99438162e0
0x00007f997b460ab0:   000000069671d030 00000006959bcef8
0x00007f997b460ac0:   00007f997b460ac0 00007f993c3eeff0
0x00007f997b460ad0:   00007f997b460b28 00007f993c47dff0
0x00007f997b460ae0:   0000000000000000 00007f993c3ef098 

Instructions: (pc=0x00007f994da88db4)
0x00007f994da88d94:   00 00 00 00 48 83 c4 18 5b 41 5c 41 5d 5d c3 48
0x00007f994da88da4:   83 3d 95 70 b2 00 00 48 8d 57 10 74 16 83 c8 ff
0x00007f994da88db4:   f0 0f c1 02 85 c0 7f 91 48 8d 75 df e8 3b 9b 7c
0x00007f994da88dc4:   fe eb 86 8b 50 f8 8d 4a ff 89 48 f8 89 d0 eb e4 

Register to memory mapping:

RAX=0x00000000ffffffff is an unknown value
RBX=0x0000000000000000 is an unknown value
RCX=0x0000000000000074 is an unknown value
RDX=0x000000000000001d is an unknown value
RSP=0x00007f997b4608f0 is pointing into the stack for thread: 0x00007f997c016000
RBP=0x00007f997b460920 is pointing into the stack for thread: 0x00007f997c016000
RSI=0x0000000000000000 is an unknown value
RDI=0x000000000000000d is an unknown value
R8 =0x00007f997b4608f8 is pointing into the stack for thread: 0x00007f997c016000
R9 =0x0000000000000008 is an unknown value
R10=0x0000000000000008 is an unknown value
R11=0x00007f9a14a5b790: <offset 0x192790> in /lib/x86_64-linux-gnu/libc.so.6 at 0x00007f9a148c9000
R12=0x00007f9946456940 is an unknown value
R13=0x00007f994662cab0 is an unknown value
R14=0x00007f994662cab0 is an unknown value
R15=0x000000000000000b is an unknown value

Stack: [0x00007f997b363000,0x00007f997b464000],  sp=0x00007f997b4608f0,  free space=1014k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libtensorflow.so+0x1a35db4]
C  [libtensorflow.so+0x297188]  TF_OperationGetAttrBool+0x68
C  [libtensorflow_jni.so+0x6fa0]  Java_org_platanios_tensorflow_jni_Op_00024_getAttrBool+0x100

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.platanios.tensorflow.jni.Op$.getAttrBool(JLjava/lang/String;)Z+0
j  org.platanios.tensorflow.api.ops.Op.$anonfun$booleanAttribute$1(Lorg/platanios/tensorflow/api/ops/Op;Ljava/lang/String;Lorg/platanios/tensorflow/api/core/Graph$Reference;)Z+8
j  org.platanios.tensorflow.api.ops.Op.$anonfun$booleanAttribute$1$adapted(Lorg/platanios/tensorflow/api/ops/Op;Ljava/lang/String;Lorg/platanios/tensorflow/api/core/Graph$Reference;)Ljava/lang/Object;+3
j  org.platanios.tensorflow.api.ops.Op$$Lambda$2195.apply(Ljava/lang/Object;)Ljava/lang/Object;+12
j  org.platanios.tensorflow.api.package$.using(Lorg/platanios/tensorflow/api/package$Closeable;Lscala/Function1;)Ljava/lang/Object;+2
j  org.platanios.tensorflow.api.ops.Op.booleanAttribute(Ljava/lang/String;)Z+17
j  org.platanios.tensorflow.api.ops.Math$Gradients$.matMulGradientCommon(Lorg/platanios/tensorflow/api/ops/Op;Lscala/collection/Seq;Ljava/lang/String;Ljava/lang/String;Z)Lscala/collection/Seq;+2
j  org.platanios.tensorflow.api.ops.Math$Gradients$.matMulGradient(Lorg/platanios/tensorflow/api/ops/Op;Lscala/collection/Seq;)Lscala/collection/Seq;+10
j  org.platanios.tensorflow.api.ops.Math$Gradients$.$anonfun$new$8(Lorg/platanios/tensorflow/api/ops/Math$Gradients$;Lorg/platanios/tensorflow/api/ops/Op;Lscala/collection/Seq;)Lscala/collection/Seq;+3
j  org.platanios.tensorflow.api.ops.Math$Gradients$$$Lambda$2097.apply(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;+12
j  org.platanios.tensorflow.api.ops.Gradients$.$anonfun$gradients$20(Lorg/platanios/tensorflow/api/ops/Op;Lscala/Function2;Lscala/collection/mutable/Seq;)Lscala/collection/Seq;+3
j  org.platanios.tensorflow.api.ops.Gradients$$$Lambda$2194.apply()Ljava/lang/Object;+12
j  org.platanios.tensorflow.api.ops.Gradients$.maybeCompile(Ljava/lang/String;Lorg/platanios/tensorflow/api/ops/Op;Lscala/Function0;)Lscala/collection/Seq;+308
j  org.platanios.tensorflow.api.ops.Gradients$.$anonfun$gradients$18(Lorg/platanios/tensorflow/api/ops/Gradients$;ZLjava/lang/String;Lscala/collection/mutable/Map;Lorg/platanios/tensorflow/api/ops/Op;Lscala/collection/mutable/Seq;Lscala/Function2;)V+38
j  org.platanios.tensorflow.api.ops.Gradients$$$Lambda$2192.apply$mcV$sp()V+28
J 23915 C1 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f99ff022f6c [0x00007f99ff022f00+0x6c]
j  scala.util.DynamicVariable.withValue(Ljava/lang/Object;Lscala/Function0;)Ljava/lang/Object;+14
j  org.platanios.tensorflow.api.ops.Op$.createWith(Lorg/platanios/tensorflow/api/core/Graph;Ljava/lang/String;Lscala/Function1;Lscala/collection/immutable/Set;Lscala/collection/immutable/Set;Lscala/collection/immutable/Map;Ljava/lang/String;Lscala/Function0;Lscala/util/DynamicVariable;)Ljava/lang/Object;+643
j  org.platanios.tensorflow.api.ops.Gradients$.$anonfun$gradients$15(Lorg/platanios/tensorflow/api/ops/Gradients$;ZLorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;Ljava/lang/String;Lscala/collection/mutable/Map;Lscala/collection/immutable/Set;Lorg/platanios/tensorflow/api/ops/Op;)V+252
j  org.platanios.tensorflow.api.ops.Gradients$$$Lambda$2186.apply$mcV$sp()V+28
J 23915 C1 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f99ff022f6c [0x00007f99ff022f00+0x6c]
j  org.platanios.tensorflow.api.ops.Gradients$.maybeColocateWith(Lorg/platanios/tensorflow/api/ops/Op;ZLscala/Function0;)Ljava/lang/Object;+118
j  org.platanios.tensorflow.api.ops.Gradients$.$anonfun$gradients$4(Lorg/platanios/tensorflow/api/ops/Gradients$;Lscala/collection/Seq;Lscala/collection/Seq;Lscala/collection/Seq;ZLorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;ZLjava/lang/String;Lscala/collection/mutable/Map;)V+277
j  org.platanios.tensorflow.api.ops.Gradients$$$Lambda$2159.apply$mcV$sp()V+36
J 23915 C1 scala.runtime.java8.JFunction0$mcV$sp.apply()Ljava/lang/Object; (10 bytes) @ 0x00007f99ff022f6c [0x00007f99ff022f00+0x6c]
j  scala.util.DynamicVariable.withValue(Ljava/lang/Object;Lscala/Function0;)Ljava/lang/Object;+14
j  org.platanios.tensorflow.api.ops.Op$.createWithNameScope(Ljava/lang/String;Lscala/collection/immutable/Set;Lscala/Function0;Lscala/util/DynamicVariable;)Ljava/lang/Object;+207
j  org.platanios.tensorflow.api.ops.Gradients$.gradients(Lscala/collection/Seq;Lscala/collection/Seq;Lscala/collection/Seq;ZLorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;ZLjava/lang/String;)Lscala/collection/Seq;+146
j  org.platanios.tensorflow.api.ops.training.optimizers.Optimizer.computeGradients(Lorg/platanios/tensorflow/api/ops/Output;Lscala/collection/Seq;Lscala/collection/immutable/Set;Lorg/platanios/tensorflow/api/ops/Gradients$GatingMethod;Lorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;Z)Lscala/collection/Seq;+239
j  org.platanios.tensorflow.api.ops.training.optimizers.Optimizer.computeGradients$(Lorg/platanios/tensorflow/api/ops/training/optimizers/Optimizer;Lorg/platanios/tensorflow/api/ops/Output;Lscala/collection/Seq;Lscala/collection/immutable/Set;Lorg/platanios/tensorflow/api/ops/Gradients$GatingMethod;Lorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;Z)Lscala/collection/Seq;+10
j  org.platanios.tensorflow.api.ops.training.optimizers.AdaGrad.computeGradients(Lorg/platanios/tensorflow/api/ops/Output;Lscala/collection/Seq;Lscala/collection/immutable/Set;Lorg/platanios/tensorflow/api/ops/Gradients$GatingMethod;Lorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;Z)Lscala/collection/Seq;+10
j  org.platanios.tensorflow.api.ops.training.optimizers.Optimizer.minimize(Lorg/platanios/tensorflow/api/ops/Output;Lscala/collection/Seq;Lscala/collection/immutable/Set;Lorg/platanios/tensorflow/api/ops/Gradients$GatingMethod;Lorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;ZLorg/platanios/tensorflow/api/ops/variables/Variable;Ljava/lang/String;)Lorg/platanios/tensorflow/api/ops/Op;+10
j  org.platanios.tensorflow.api.ops.training.optimizers.Optimizer.minimize$(Lorg/platanios/tensorflow/api/ops/training/optimizers/Optimizer;Lorg/platanios/tensorflow/api/ops/Output;Lscala/collection/Seq;Lscala/collection/immutable/Set;Lorg/platanios/tensorflow/api/ops/Gradients$GatingMethod;Lorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;ZLorg/platanios/tensorflow/api/ops/variables/Variable;Ljava/lang/String;)Lorg/platanios/tensorflow/api/ops/Op;+14
j  org.platanios.tensorflow.api.ops.training.optimizers.AdaGrad.minimize(Lorg/platanios/tensorflow/api/ops/Output;Lscala/collection/Seq;Lscala/collection/immutable/Set;Lorg/platanios/tensorflow/api/ops/Gradients$GatingMethod;Lorg/platanios/tensorflow/api/ops/Gradients$AggregationMethod;ZLorg/platanios/tensorflow/api/ops/variables/Variable;Ljava/lang/String;)Lorg/platanios/tensorflow/api/ops/Op;+14
j  org.platanios.tensorflow.examples.LinearRegression$.main([Ljava/lang/String;)V+408
j  org.platanios.tensorflow.examples.LinearRegression.main([Ljava/lang/String;)V+4
v  ~StubRoutines::call_stub
j  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
j  sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100
J 18109 C1 sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (10 bytes) @ 0x00007f99ff39784c [0x00007f99ff397740+0x10c]
J 21316 C1 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (62 bytes) @ 0x00007f9a009bc39c [0x00007f9a009bbfa0+0x3fc]
j  sbt.Run.invokeMain(Ljava/lang/ClassLoader;Ljava/lang/reflect/Method;Lscala/collection/Seq;)V+44
j  sbt.Run.run0(Ljava/lang/String;Lscala/collection/Seq;Lscala/collection/Seq;Lsbt/Logger;)V+48
j  sbt.Run.sbt$Run$$execute$1(Ljava/lang/String;Lscala/collection/Seq;Lscala/collection/Seq;Lsbt/Logger;)V+6
j  sbt.Run$$anonfun$run$1.apply$mcV$sp()V+20
j  sbt.Run$$anonfun$run$1.apply()V+1
j  sbt.Run$$anonfun$run$1.apply()Ljava/lang/Object;+1
j  sbt.Logger$$anon$4.apply()Ljava/lang/Object;+4
j  sbt.TrapExit$App.run()V+4
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
eaplatanios commented 7 years ago

I have also seen this before but never got the chance to investigate. A good first step, in my opinion, would be to try and reproduce the problem with the official TensorFlow Java API. If it occurs then too, we could create an issue in the main repository.

I'm sorry for the slow response but I've been traveling with no connection. I'll be active again starting July 8th, when I'm back. :)

eaplatanios commented 7 years ago

@sbrunk That may have been fixed now. Let's check again when the new TensorFlow release comes. The code will only work with the current master branch of the repository for now, but once they release we can come back to this and see if it's been fixed.

eaplatanios commented 7 years ago

@sbrunk This was indeed a bug on my side. Thanks a lot for catching it! It has now been fixed. :)