caicloud / tensorflow-tutorial

Example TensorFlow codes and Caicloud TensorFlow as a Service dev environment.
2.93k stars 2.08k forks source link

使用tf.train.shuffle_batch方法出现异常 #45

Closed superping closed 6 years ago

superping commented 7 years ago

使用caicloud.clever.tensorflow库在本机运行的,不带caicloud.clever.tensorflow库的代码原本可用 W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: You must feed a value for placeholder tensor 'image' with dtype float and shape [12,224,224,3] [[Node: image = Placeholder[dtype=DT_FLOAT, shape=[12,224,224,3], _device="/job:localhost/replica:0/task:0/cpu:0"]()]] Traceback (most recent call last): File "./src/train.py", line 256, in distTfRunner.run(train_fn) File "/usr/local/lib/python2.7/site-packages/caicloud/clever/tensorflow/dist_base.py", line 250, in run should_stop = train_fn(sess, step) File "./src/train.py", line 166, in train_fn images_array, labels_array = session.run([_train_images, _train_labels]) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run run_metadata_ptr) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 12, current size 1) [[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

Caused by op u'shuffle_batch', defined at: File "./src/train.py", line 256, in distTfRunner.run(train_fn) File "/usr/local/lib/python2.7/site-packages/caicloud/clever/tensorflow/dist_base.py", line 208, in run model_fn_handler = self._call_model_fn() File "/usr/local/lib/python2.7/site-packages/caicloud/clever/tensorflow/dist_base.py", line 169, in _call_model_fn model_fn_handler = self._model_fn(False, 1) File "./src/train.py", line 83, in model_fn _train_images, _train_labels = inputs(FLAGS.batch, FLAGS.train, FLAGS.train_labels) File "/Users/xieanping/sourcecode/github/segnet-caicloud/src/inputs.py", line 29, in inputs min_after_dequeue=500) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 1165, in shuffle_batch name=name) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/training/input.py", line 739, in _shuffle_batch dequeued = queue.dequeue_many(batch_size, name=name) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1310, in _queue_dequeue_many_v2 timeout_ms=timeout_ms, name=name) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in init self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 12, current size 1) [[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

lienhua34 commented 7 years ago

@superping 能否提供一下 src/train.py 文件的代码,我看异常信息是,你定义了一个 placeholder,但是在 session.run() 的时候没有 feed。

lienhua34 commented 7 years ago

@superping 这个问题看了一下,这个是目前平台的一个 Bug,当 tf.summary 中要保存的数据需要进行 feed 的时候,就会出现这个问题。这个问题我们纪录下来,后续版本会进行修复。多谢反馈!

目前一个规避措施:在 `model_fn` 返回的 ModelFnHandler 对象中将 summary_op 设置为 None,让 TaaS 平台不自动计算 summary 信息。

caicloud-bot commented 6 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

/lifecycle stale

caicloud-bot commented 6 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

/lifecycle rotten /remove-lifecycle stale

caicloud-bot commented 6 years ago

Rotten issues close after 30d of inactivity. Reopen the issue with /reopen. Mark the issue as fresh with /remove-lifecycle rotten.

/close