caicloud / tensorflow-tutorial

Example TensorFlow codes and Caicloud TensorFlow as a Service dev environment.
2.93k stars 2.08k forks source link

第10章 多GPU并行 训练一半报错 #37

Closed weizhenzhao closed 7 years ago

weizhenzhao commented 7 years ago

# 定义输入队列并返回 min_after_dequeue = 10000 capacity = min_after_dequeue + 3 * BATCH_SIZE return tf.train.shuffle_batch([retyped_image, label], batch_size=BATCH_SIZE, capacity=capacity, min_after_dequeue=min_after_dequeue) --------这一行报的错

log里边这样写的 `Caused by op 'shufflebatch', defined at: File "C:\Users\weizhen\workspace\TextUtil\TFMULTIGPU.py", line 203, in tf.app.run() File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\platform\app.py", line 43, in run sys.exit(main(sys.argv[:1] + flagspassthrough)) File "C:\Users\weizhen\workspace\TextUtil\TFMULTIGPU.py", line 100, in main x, y_ = getinput() File "C:\Users\weizhen\workspace\TextUtil\TFMULTIGPU.py", line 59, in get_input min_after_dequeue=min_after_dequeue) File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\training\input.py", line 917, in shuffle_batch dequeued = queue.dequeue_many(batch_size, name=name) File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\data_flow_ops.py", line 458, in dequeue_many self._queue_ref, n=n, component_types=self._dtypes, name=name) File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\ops\gen_data_flow_ops.py", line 1099, in _queue_dequeue_many timeout_ms=timeout_ms, name=name) File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 759, in apply_op op_def=op_def) File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 2240, in create_op original_op=self._default_original_op, op_def=op_def) File "C:\Users\weizhen\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\framework\ops.py", line 1128, in init self._traceback = _extract_stack()

OutOfRangeError (see above for traceback): RandomShuffleQueue '_2_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 100, current size 0) [[Node: shuffle_batch = QueueDequeueMany[_class=["loc:@shuffle_batch/random_shuffle_queue"], component_types=[DT_FLOAT, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_gpu_executor.cc:652] Deallocating stream with pending work `

skx300 commented 7 years ago

应该是缺少output.tfrecords文件,书中代码读取数据的方法里面要求要读入output.tfrecords文件。

运行第七章中TFRecord_test.py的示例代码,会在/Records里找到output.tfrecords文件。之后把路径指向这个文件就可以了。

ScorpioCPH commented 7 years ago

好的,多谢反馈。

ScorpioCPH commented 7 years ago

@weizhenzhao ping,请按照 @skx300 的方法试下?

weizhenzhao commented 7 years ago

@ScorpioCPH @skx300 thank you 这里已经跑通了http://www.cnblogs.com/weizhen/p/6911261.html