RobertCsordas / RFCN-tensorflow

RFCN implementation in TensorFlow
291 stars 137 forks source link

Errors for training! #30

Open balzac001 opened 6 years ago

balzac001 commented 6 years ago

Hello Xdever! The test.py is ok, but when I tried the main.py, there is errors I download MS COCO 2014, and extracted it! Then I use the following command: python main.py -dataset /home/hfl/RFCN-tensorflow-master-test1/COCO -name /home/hfl/RFCN-tensorflow-master-test1/export2 Then the errors like this:

WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/weights/Adam" not found in file to load. WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/weights/Adam_1" not found in file to load. WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/BatchNorm/beta/Adam" not found in file to load. WARNING: Variable "InceptionResnetV2/Conv2d_7b_1x1/BatchNorm/beta/Adam_1" not found in file to load. WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta WARNING: Unused variable: InceptionResnetV2/AuxLogits/Logits/biases WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/weights WARNING: Unused variable: InceptionResnetV2/AuxLogits/Logits/weights WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/weights WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/moving_variance WARNING: Unused variable: InceptionResnetV2/Logits/Logits/biases WARNING: Unused variable: InceptionResnetV2/Logits/Logits/weights WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/moving_mean WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_variance WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean WARNING: Unused variable: global_step WARNING: Unused variable: InceptionResnetV2/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta Done. BoxLoader: Loaded 123287 files. 2018-03-16 21:19:23.107145: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2018-03-16 21:19:23.107192: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2018-03-16 21:19:23.107238: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed 2018-03-16 21:19:23.107249: W tensorflow/core/kernels/queue_base.cc:277] _0_dataset/fifo_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "main.py", line 157, in res = runManager.modRun(i) File "/home/hfl/RFCN-tensorflow-master-test1/Utils/RunManager.py", line 97, in modRun return self.runAndMerge(feed_dict, options=options if options is not None else self.options, run_metadata=run_metadata if run_metadata is not None else self.run_metadata) File "/home/hfl/RFCN-tensorflow-master-test1/Utils/RunManager.py", line 71, in runAndMerge res = self.sess.run(self.inputTensors, feed_dict=feed_dict, options=options, run_metadata=run_metadata) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1137, in _run feed_dict_tensor, options, run_metadata) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run options, run_metadata) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Input to reshape is a tensor with 9489 values, but the requested shape has 12356 [[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]] [[Node: optimizer/gradients/cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/PosRoiPooling_grad/Shape/_2039 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge6985...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape', defined at: File "main.py", line 119, in trainOp=createUpdateOp() File "main.py", line 106, in createUpdateOp grads = optimizer.compute_gradients(totalLoss, var_list=net.getVariables()) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 460, in compute_gradients colocate_gradients_with_ops=colocate_gradients_with_ops) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in gradients lambda: grad_fn(op, out_grads)) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 377, in _MaybeCompile return grad_fn() # Exit early File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py", line 611, in lambda: grad_fn(op, out_grads)) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/array_grad.py", line 504, in _ReshapeGrad return [array_ops.reshape(grad, array_ops.shape(op.inputs[0])), None] File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 3903, in reshape "Reshape", tensor=tensor, shape=shape, name=name) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op op_def=op_def) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1650, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

...which was originally created as op u'RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2', defined at: File "main.py", line 99, in tf.losses.add_loss(net.getLoss(boxes, classes)) File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/BoxNetwork.py", line 50, in getLoss return self.rpn.loss(refBoxes) + self.boxRefiner.loss(self.proposals, refBoxes, refClasses) File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 176, in loss return tf.cond(tf.shape(refBoxes)[0]>0, lambda: calcLoss(), lambda: tf.constant(0.0)) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 432, in new_func return func(*args, *kwargs) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2018, in cond orig_res_t, res_t = context_t.BuildCondBranch(true_fn) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1868, in BuildCondBranch original_result = fn() File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 176, in return tf.cond(tf.shape(refBoxes)[0]>0, lambda: calcLoss(), lambda: tf.constant(0.0)) File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 163, in calcLoss positiveLosses, negativeLosses = calcAllLosses(inAnchros, inBoxes, inRawSizes, inScores, inBoxSizes) File "/home/hfl/RFCN-tensorflow-master-test1/BoxEngine/RPN.py", line 142, in calcAllLosses classificationLoss = tf.nn.softmax_cross_entropy_with_logits(logits=scores, labels=refScores) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 250, in new_func return func(args, **kwargs) File "/home/hfl/.local/lib/python2.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1960, in softmax_cross_entropy_with_logits labels=labels, logits=logits, dim=dim, name=name)

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 9489 values, but the requested shape has 12356 [[Node: optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Reshape/tensor, optimizer/gradients/RPNloss/cond/calcRPNLoss/calcAllRPNLosses/softmax_cross_entropy_with_logits_sg/Reshape_2_grad/Shape)]] [[Node: optimizer/gradients/cond/getRefinementLoss/cond/getPosLoss/boxesRefinementLoss/refineBoxes/roiMean/positionSensitiveRoiPooling/PosRoiPooling_grad/Shape/_2039 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge6985...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

zhangxixi0904 commented 6 years ago

I got the same problem: Input to reshape is a tensor with 9489 values, but the requested shape has 12356

balzac001 commented 6 years ago

I think this error has some relationship with the version of CUDA and TF, I used another computer to try this program, which is install CUDA8 and TF1.2, this error doesn't exist, but my computer is installed CUDA9 and TF1.4, I don't know how to debug!!!

zhangxixi0904 commented 6 years ago

Thank you so much for your advice!! I am wondering if the version incompatible too @balzac001

timtian12 commented 6 years ago

Are you run in cuda9 finally?

balzac001 commented 6 years ago

No, I used Faster RCNN and YOLO, I can't solve this error!