CharlesShang / FastMaskRCNN

Mask RCNN in TensorFlow
Apache License 2.0
3.1k stars 1.1k forks source link

Code stops running after arbitrary number of iterations in train .py #42

Open sohamghoshmusigma opened 7 years ago

sohamghoshmusigma commented 7 years ago

This is the entire error that the code is throwing, the number of iterations it stops after changes every time I run the code

iter 608: image-id:0022482, time:22.124(sec), regular_loss: 0.248815, total-loss 4425.7246(94.0396, 4331.6851, 0.000000, 0.0000, 0.0000), instances: 13, batch:(32|136, 0|64, 0|0) Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call ret = func(*args) File "train/../libs/layers/sample.py", line 99, in sample_rpn_outputs_wrt_gt_boxes gt_argmax_overlaps = overlaps.argmax(axis=0) # G ValueError: attempt to get argmax of an empty sequence 2017-05-02 20:46:23.686108: W tensorflow/core/framework/op_kernel.cc:1152] Internal: Failed to run py callback pyfunc_5: see error log. Traceback (most recent call last): File "train/train.py", line 221, in train() File "train/train.py", line 173, in train batch_info ) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 778, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 982, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1032, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1052, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InternalError: Failed to run py callback pyfunc_5: see error log. [[Node: pyramid_1/SampleBoxesWithGT/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_BOOL], Tout=[DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32], token="pyfunc_5", _device="/job:localhost/replica:0/task:0/cpu:0"](pyramid_1/AnchorDecoder/Reshape, pyramid_1/strided_slice_8, concat, pyramid_1/SampleBoxesWithGT/PyFunc/input_3)]]

Caused by op u'pyramid_1/SampleBoxesWithGT/PyFunc', defined at: File "train/train.py", line 221, in train() File "train/train.py", line 120, in train loss_weights=[0.2, 0.2, 1.0, 0.2, 1.0]) File "train/../libs/nets/pyramid_network.py", line 536, in build is_training=is_training, gt_boxes=gt_boxes) File "train/../libs/nets/pyramid_network.py", line 246, in build_heads sample_rpn_outputs_with_gt(rois, rpn_probs[:, 1], gt_boxes, is_training=is_training) File "train/../libs/layers/wrapper.py", line 132, in sample_with_gt_wrapper [tf.float32, tf.float32, tf.int32, tf.float32, tf.float32, tf.int32]) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 189, in py_func input=inp, token=token, Tout=Tout, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_script_ops.py", line 40, in _py_func name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 768, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2336, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1228, in init self._traceback = _extract_stack()

InternalError (see above for traceback): Failed to run py callback pyfunc_5: see error log. [[Node: pyramid_1/SampleBoxesWithGT/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_BOOL], Tout=[DT_FLOAT, DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_INT32], token="pyfunc_5", _device="/job:localhost/replica:0/task:0/cpu:0"](pyramid_1/AnchorDecoder/Reshape, pyramid_1/strided_slice_8, concat, pyramid_1/SampleBoxesWithGT/PyFunc/input_3)]]

Can someone help me out here?

DeepStillWater commented 7 years ago

i have same problem,but my python version is 3.5.2,did you solve it?

sohamghoshmusigma commented 7 years ago

@DeepStillWater

No, I still haven't been able to solve it, the problem maybe the same as the referenced issue you see in this thread. I've also noticed that the array is full of NaN values right before the crash. Looking into what that happens or how it can be solved. Hope hat helps you a bit. Let me know if you come accross any suggestions to solve this.

mbuffier commented 7 years ago

Hi, I have the same issue, have you successed in solving it ? Thank you !

tanchaozhen commented 7 years ago

I have the same issue, id you solve it ? I'm building on CPU only, on ubuntu16.04, python 2.7 and tensorflow 1.2.