Closed mandar2812 closed 6 years ago
Hi Mandar,
Could you try using the in-memory estimator and tell me if the same error appears?
Thanks!
On Dec 9, 2017, 5:17 PM -0800, Mandar Chandorkar notifications@github.com, wrote:
When running the cifar and other examples, I get the following exception sporadically (not consistently reproducible) org.platanios.tensorflow.jni.FailedPreconditionException: GetNext() failed because the iterator has not been initialized. Ensure that you have run the initializer operation for this iterator before getting the next element. [[Node: Model/Iterator/Next = IteratorGetNextoutput_shapes=[[?,32,32,4], [?]], output_types=[DT_UINT8, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]] org.platanios.tensorflow.jni.Session$.run(Native Method) org.platanios.tensorflow.api.core.client.Session.runHelper(Session.scala:137) org.platanios.tensorflow.api.learn.SessionWrapper.runHelper(SessionWrapper.scala:114) org.platanios.tensorflow.api.core.client.Session.run(Session.scala:76) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply$mcV$sp(FileBasedEstimator.scala:160) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply(FileBasedEstimator.scala:135) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator$$anonfun$trainWithHooks$1.apply(FileBasedEstimator.scala:135) scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) org.platanios.tensorflow.api.ops.Op$.createWith(Op.scala:844) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator.trainWithHooks(FileBasedEstimator.scala:135) org.platanios.tensorflow.api.learn.estimators.FileBasedEstimator.train(FileBasedEstimator.scala:86) ammonite.$file.helios.scripts.eit_goes_cnn$.
(eit_goes_cnn.sc:104) ammonite.$file.helios.scripts.eit_goes_cnn$. (eit_goes_cnn.sc:15) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Yes! From preliminary runs, InMemoryEstimator
does not give the same problem. What is causing this to happen though. I perused through some of the code. I think its the Iterator
class which is where this might be originating right?
Its a bit difficult to make sense of the general code structure, there are a lot of generics/type arguments, but all in due time!
From the stack trace line 140 in FileBasedEstimator.scala
comes to light
val trainOps = Op.createWithNameScope("Model")(model.buildTrainOps())
Following that rabbit hole led me to the Iterator
class which seems to have Iterator.iteratorGetNext
method.
I'm using these issues I file to get a better understanding of the structure of TF-Scala, hopefully in some time I can get a more organised view of how the code base is structured.
Thanks for the tip!
@mandar2812 I just got back and I'll look into this today. Thanks a lot for taking the time to try and figure out what's wrong. It's true that there are parts of the library architecture that may be a bit hard to grasp, but I'm here to answer any questions you may have and I also plan to add some more documentation explaining the architecture a bit, soon. It would be really helpful if you could log some of the issues and experience you've had trying to understand what's going on, so I can cover them there. :)
@mandar2812 I think this is fixed in the last commit. Could you please confirm? :)
@eaplatanios Great! Can you update the tensorflow_scala artifact on sonatype so I can verify this?
I'm making some more edits and I'll update them very soon -- either tonight or tomorrow. :)
When running the cifar and other examples, I get the following exception sporadically (not consistently reproducible)