eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
936 stars 95 forks source link

Error during restore graph in multi-thread environment #75

Closed lucataglia closed 6 years ago

lucataglia commented 6 years ago

I fail to restore the TensorFlow graph in a multi-threading application. I have more the one thread running and I need to restore a different session for each thread, but I can not pass the Saver from the outside so every thread need to call tf.Saver.fromMetaGraphDef(mgf)

Here an example code that simulate my situation:

object MultiThreading {
  def main(args: Array[String]): Unit = {
    for(i <- 1 to 2){
      new Thread{
        override def run(): Unit = {
          println("starting thread")
          val modelPath = "model-store-python/my-model"
          val checkpoint: Path = Paths.get(modelPath)
          val metaPath = Paths.get(checkpoint + ".meta")
          val mgf = MetaGraphDef.parseFrom(new BufferedInputStream(new FileInputStream(metaPath.toFile)))
          val saver: Saver = tf.Saver.fromMetaGraphDef(mgf)   // < - - -  crash here
//          val session = Session()
//          saver.restore(session, checkpoint)
        }
      }.run()
    }
  }
}

Here the error:

objc[27785]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/bin/java (0x1098a04c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_144.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x1099244e0). One of the two will be used. Which one is undefined.
starting thread
2018-02-05 16:57:29.264 [main] INFO  TensorFlow Native - Extracting the 'tensorflow_framework' native library to /var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries3425592320071619266/libtensorflow_framework.so.
2018-02-05 16:57:29.505 [main] INFO  TensorFlow Native - Copied 11538020 bytes to /var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries3425592320071619266/libtensorflow_framework.so.
2018-02-05 16:57:29.506 [main] INFO  TensorFlow Native - Extracting the 'tensorflow' native library to /var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries3425592320071619266/libtensorflow.so.
2018-02-05 16:57:30.366 [main] INFO  TensorFlow Native - Copied 117249456 bytes to /var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries3425592320071619266/libtensorflow.so.
2018-02-05 16:57:30.367 [main] INFO  TensorFlow Native - Extracting the 'tensorflow_jni' native library to /var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries3425592320071619266/libtensorflow_jni.so.
2018-02-05 16:57:30.388 [main] INFO  TensorFlow Native - Copied 638888 bytes to /var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries3425592320071619266/libtensorflow_jni.so.
2018-02-05 16:57:30.476 [main] INFO  TensorFlow Native - Extracting the 'tensorflow_ops' native library to /var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries3425592320071619266/libtensorflow_ops.so.
2018-02-05 16:57:30.479 [main] INFO  TensorFlow Native - Copied 79736 bytes to /var/folders/60/4vxzt5fs3md_nx8pq4gw6nph0000gn/T/tensorflow_scala_native_libraries3425592320071619266/libtensorflow_ops.so.
2018-02-05 16:57:30.496646: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX AVX2 FMA
starting thread
Exception in thread "main" org.platanios.tensorflow.jni.InvalidArgumentException: Node name 'foo_inputs' already exists in the Graph
    at org.platanios.tensorflow.jni.Graph$.importGraphDef(Native Method)
    at org.platanios.tensorflow.api.core.Graph.importGraphDef(Graph.scala:555)
    at org.platanios.tensorflow.api.core.Graph.importMetaGraphDef(Graph.scala:631)
    at org.platanios.tensorflow.api.ops.variables.Saver$.fromMetaGraphDef(Saver.scala:487)
    at io.MultiThreading$$anonfun$main$1$$anon$1.run(MultiThreading.scala:20)
    at ioMultiThreading$$anonfun$main$1.apply$mcVI$sp(MultiThreading.scala:24)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
    at io.MultiThreading$.main(MultiThreading.scala:12)
    at io.MultiThreading.main(MultiThreading.scala)

Process finished with exit code 1

I don't get if I am doing something wrong or this operation with the Scala API in a multi-thread environment it isn't already supported. Is there maybe e method to get a Saver instance that has been already created by some other thread ?

mandar2812 commented 6 years ago

About the multi-threaded context, I don't know enough to comment. But this could be similar to an issue I filed some time back (#56)

lucataglia commented 6 years ago

@mandar2812 It seem that using tf.createWith(graph = Graph()){...} fix the problem. Now I try to do some more tests before confirming that my problem is completely gone. In the meanwhile thank you !!

eaplatanios commented 6 years ago

@lucaRadicalbit @mandar2812 Thanks for suggesting that Mandar! I was going to suggest the same thing. :)

mandar2812 commented 6 years ago

@lucaRadicalbit @eaplatanios Happy to help 😊