eaplatanios / tensorflow_scala

TensorFlow API for the Scala Programming Language
http://platanios.org/tensorflow_scala/
Apache License 2.0
936 stars 95 forks source link

GetNext method failing #63

Closed mandar2812 closed 6 years ago

mandar2812 commented 6 years ago

When running the cifar and other examples, I get the following exception sporadically (not consistently reproducible)

org.platanios.tensorflow.jni.FailedPreconditionException: GetNext() failed because the iterator has not been initialized. Ensure that you have run the initializer operation for this iterator before getting the next element.
     [[Node: Model/Iterator/Next = IteratorGetNext[output_shapes=[[?,32,32,4], [?]], output_types=[DT_UINT8, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"](Model/Iterator)]]
  org.platanios.tensorflow.jni.Session$.run(Native Method)
  org.platanios.tensorflow.api.core.client.Session.runHelper(Session.scala:137)
  org.platanios.tensorflow.api.learn.SessionWrapper.runHelper(SessionWrapper.scala:114)
  org.platanios.tensorflow.api.core.client.Session.run(Session.scala:76)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply$mcV$sp(FileBasedEstimator.scala:160)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply(FileBasedEstimator.scala:135)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply(FileBasedEstimator.scala:135)
  scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
  org.platanios.tensorflow.api.ops.Op$.createWith(Op.scala:844)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator.trainWithHooks(FileBasedEstimator.scala:135)
  org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator.train(FileBasedEstimator.scala:86)
  ammonite.$file.helios.scripts.eit_goes_cnn$.<init>(eit_goes_cnn.sc:104)
  ammonite.$file.helios.scripts.eit_goes_cnn$.<clinit>(eit_goes_cnn.sc:15)
eaplatanios commented 6 years ago

Hi Mandar,

Could you try using the in-memory estimator and tell me if the same error appears?

Thanks!

On Dec 9, 2017, 5:17 PM -0800, Mandar Chandorkar notifications@github.com, wrote:

When running the cifar and other examples, I get the following exception sporadically (not consistently reproducible) org.platanios.tensorflow.jni.FailedPreconditionException: GetNext() failed because the iterator has not been initialized. Ensure that you have run the initializer operation for this iterator before getting the next element. [[Node: Model/Iterator/Next = IteratorGetNextoutput_shapes=[[?,32,32,4], [?]], output_types=[DT_UINT8, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] org.platanios.tensorflow.jni.Session$.run(Native Method) org.platanios.tensorflow.api.core.client.Session.runHelper(Session.scala:137) org.platanios.tensorflow.api.learn.SessionWrapper.runHelper(SessionWrapper.scala:114) org.platanios.tensorflow.api.core.client.Session.run(Session.scala:76) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply$mcV$sp(FileBasedEstimator.scala:160) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply(FileBasedEstimator.scala:135) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply(FileBasedEstimator.scala:135) scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) org.platanios.tensorflow.api.ops.Op$.createWith(Op.scala:844) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator.trainWithHooks(FileBasedEstimator.scala:135) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator.train(FileBasedEstimator.scala:86) ammonite.$file.helios.scripts.eit_goes_cnn$.(eit_goes_cnn.sc:104) ammonite.$file.helios.scripts.eit_goes_cnn$.(eit_goes_cnn.sc:15)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

mandar2812 commented 6 years ago

Yes! From preliminary runs, InMemoryEstimator does not give the same problem. What is causing this to happen though. I perused through some of the code. I think its the Iterator class which is where this might be originating right?

Its a bit difficult to make sense of the general code structure, there are a lot of generics/type arguments, but all in due time!

From the stack trace line 140 in FileBasedEstimator.scala comes to light

val trainOps = Op.createWithNameScope("Model")(model.buildTrainOps())

Following that rabbit hole led me to the Iterator class which seems to have Iterator.iteratorGetNext method.

I'm using these issues I file to get a better understanding of the structure of TF-Scala, hopefully in some time I can get a more organised view of how the code base is structured.

Thanks for the tip!

eaplatanios commented 6 years ago

@mandar2812 I just got back and I'll look into this today. Thanks a lot for taking the time to try and figure out what's wrong. It's true that there are parts of the library architecture that may be a bit hard to grasp, but I'm here to answer any questions you may have and I also plan to add some more documentation explaining the architecture a bit, soon. It would be really helpful if you could log some of the issues and experience you've had trying to understand what's going on, so I can cover them there. :)

eaplatanios commented 6 years ago

@mandar2812 I think this is fixed in the last commit. Could you please confirm? :)

mandar2812 commented 6 years ago

@eaplatanios Great! Can you update the tensorflow_scala artifact on sonatype so I can verify this?

eaplatanios commented 6 years ago

I'm making some more edits and I'll update them very soon -- either tonight or tomorrow. :)