dragen1860 / TensorFlow-2.x-Tutorials

TensorFlow 2.x version's Tutorials and Examples, including CNN, RNN, GAN, Auto-Encoders, FasterRCNN, GPT, BERT examples, etc. TF 2.0版入门实例代码,实战教程。
6.38k stars 2.24k forks source link

Error in fasterRCNN #13

Open maggiezha opened 5 years ago

maggiezha commented 5 years ago

After installing all the packages, when I ran train_model.py, I got this error: 2.0.0-dev20190519 loading annotations into memory... Traceback (most recent call last): File "trainmodel.py", line 28, in scale=(800, 1216)) File "/home/maggie/Desktop/TensorFlow-2.x-Tutorials/lesson26-fasterRCNN/detection/datasets/coco.py", line 33, in init self.coco = COCO("{}/annotations/instances{}2017.json".format(dataset_dir, subset)) File "/home/maggie/.local/lib/python3.6/site-packages/pycocotools/coco.py", line 84, in init dataset = json.load(open(annotation_file, 'r')) FileNotFoundError: [Errno 2] No such file or directory: '/scratch/llong/datasets/coco2017//annotations/instances_train2017.json'

Then I found this blog has some info, so I tried to download annotation from coco website and changed the directory to the annotation as the second person in this blog suggested: https://github.com/cocodataset/cocoapi/issues/191

but then I got another error: Traceback (most recent call last): File "train_model.py", line 8, in from detection.datasets import coco, data_generator File "/home/maggie/Desktop/TensorFlow-2.x-Tutorials/lesson26-fasterRCNN/detection/datasets/coco.py", line 38 self.cat_ids = self.coco.getCatIds() ^ IndentationError: unexpected indent

dragen1860 commented 5 years ago

it's just a indent error. I dnt know why but i suggest you solved it by yourself. i will test it later. thanks.

maggiezha commented 5 years ago

Thanks, the problem was solved when I modified the data dir in train_model.py (last time I modified coco.py and it did not work). The training started until epoch 0 200 1.8066142 and reported: tensorflow/core/framework/tensor.cc:755] Type not set Aborted (core dumped) Not sure if this means it was out of memory, but even I used export CUDA_VISIBLE_DEVICES=2 to specify to use my GV100 GPU which has 32GB memory, It did not work and nvidia-smi shows GV100 was not used, instead it always use TITAN V which only has 12GB memory

Remember2018 commented 4 years ago

Hello, have you trained the FasterRCNN on COCO2017? If convenient, could you please share the final detection performance? Thanks.

moulicm111 commented 4 years ago

Please explain this error

2.1.0
loading annotations into memory...
Done (t=0.08s)
creating index...
index created!
Traceback (most recent call last):
  File "train_model.py", line 50, in <module>
    _ = model((batch_imgs, batch_metas), training=False)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/home/16-fasterRCNN/detection/models/detectors/faster_rcnn.py", line 157, in call
    rcnn_probs_list, rcnn_deltas_list, rois_list, img_metas)
  File "/home/16-fasterRCNN/detection/models/bbox_heads/bbox_head.py", line 121, in get_bboxes
    for i in range(img_metas.shape[0])
  File "/home/16-fasterRCNN/detection/models/bbox_heads/bbox_head.py", line 121, in <listcomp>
    for i in range(img_metas.shape[0])
  File "/home/16-fasterRCNN/detection/models/bbox_heads/bbox_head.py", line 188, in _get_bboxes_single
    nms_keep = tf.concat(nms_keep, axis=0)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 1517, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 1118, in concat_v2
    _ops.raise_from_not_ok_status(e, name)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: OpKernel 'ConcatV2' has constraint on attr 'T' not in NodeDef '[N=0, Tidx=DT_INT32]', KernelDef: 'op: "ConcatV2" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_UINT64 } } } host_memory_arg: "axis"' [Op:ConcatV2] name: concat
wjunneng commented 3 years ago

Please explain this error

2.1.0
loading annotations into memory...
Done (t=0.08s)
creating index...
index created!
Traceback (most recent call last):
  File "train_model.py", line 50, in <module>
    _ = model((batch_imgs, batch_metas), training=False)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 822, in __call__
    outputs = self.call(cast_inputs, *args, **kwargs)
  File "/home/16-fasterRCNN/detection/models/detectors/faster_rcnn.py", line 157, in call
    rcnn_probs_list, rcnn_deltas_list, rois_list, img_metas)
  File "/home/16-fasterRCNN/detection/models/bbox_heads/bbox_head.py", line 121, in get_bboxes
    for i in range(img_metas.shape[0])
  File "/home/16-fasterRCNN/detection/models/bbox_heads/bbox_head.py", line 121, in <listcomp>
    for i in range(img_metas.shape[0])
  File "/home/16-fasterRCNN/detection/models/bbox_heads/bbox_head.py", line 188, in _get_bboxes_single
    nms_keep = tf.concat(nms_keep, axis=0)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/ops/array_ops.py", line 1517, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/ops/gen_array_ops.py", line 1118, in concat_v2
    _ops.raise_from_not_ok_status(e, name)
  File "/home/advancedtf/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 6606, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: OpKernel 'ConcatV2' has constraint on attr 'T' not in NodeDef '[N=0, Tidx=DT_INT32]', KernelDef: 'op: "ConcatV2" device_type: "CPU" constraint { name: "T" allowed_values { list { type: DT_UINT64 } } } host_memory_arg: "axis"' [Op:ConcatV2] name: concat

batch_size设置太大,大于样本数了,所以出现以上错误。调低batch_size即可