dragen1860 / TensorFlow-2.x-Tutorials

TensorFlow 2.x version's Tutorials and Examples, including CNN, RNN, GAN, Auto-Encoders, FasterRCNN, GPT, BERT examples, etc. TF 2.0版入门实例代码,实战教程。
6.38k stars 2.23k forks source link

Faster RCNN - InvalidArgumentError: Value for attr 'N' of 0 must be at least minimum 2 #22

Open jcburnel opened 5 years ago

jcburnel commented 5 years ago

hello, running with no change in code got me an error in inspect_model notebook,

this line:

detections_list = model.bbox_head.get_bboxes( rcnn_probs_list, rcnn_deltas_list, rois_list, batch_metas)

got this error:

InvalidArgumentError: Value for attr 'N' of 0 must be at least minimum 2 ; NodeDef: {{node ConcatV2}}; Op<name=ConcatV2; signature=values:N*T, axis:Tidx -> output:T; attr=N:int,min=2; attr=T:type; attr=Tidx:type,default=DT_INT32,allowed=[DT_INT32, DT_INT64]> [Op:ConcatV2] name: concat

using python 3.7 and tf 2.0.0-beta1

c-herring commented 4 years ago

I also am getting this issue. Looks like the network predicts all classes to be the zeroth class. So network is not predicting anything.

So, in _get_bboxes_single() it first runs class_ids = tf.argmax(rcnn_probs, axis=1, output_type=tf.int32)

which produces an array of zeros

then:

keep = tf.where(class_ids > 0)[:, 0]

so keep is of length zero and eventually we try to fun tf.concat on an array of length zero which produces the error.

I am just trying to overfit from a randomly initialized network, since I cannot download the weights (drive link is dead and I cannot download from baidu).

Perhaps the network is behaving correctly? It has just convered to predict nothing, however it manifests in a corner case that is not caught by the code (ie no class proposals at all)

jcburnel commented 4 years ago

I have downloaded the weights and have the same behaviour.

c-herring commented 4 years ago

Ah interesting, so it is predicting nothing even when you have trained weights?

I wonder if you could possibly share the weights? I just tried for a couple of hours to try and get a Baidu account but without Chinese phone number it seems impossible :(

jcburnel commented 4 years ago

I'll try to share it somehow today.

In fact it did predict something if you don't train it again and try it on some example, it is during the training that we got some gradients error, and after that the network can't "recover" (I'm sorry for the lack of explanation, It's been a 'long' time since I tried it)

c-herring commented 4 years ago

Thanks that explanation makes sense. At least a good place for me to start debugging anyway :)

That would be really awesome if you could. If you have a google account you can share it to my google drive here: https://drive.google.com/drive/folders/1jdY9u3YHWiuGbtNX7N8I6iVYpBL1kRZj?usp=sharing

hehongjie commented 4 years ago

Hi, have you solved the problem? I am also stopped here.

yunfei1999 commented 1 year ago

我遇到了这个问题? Traceback (most recent call last): File "E:/obeject_detection/fasterRCNN/trainmodel.py", line 50, in = model((batch_imgs, batch_metas), training=False) File "E:\Anaconda3\envs\tf-gpu2.7.0\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler raise e.with_traceback(filtered_tb) from None File "E:\obeject_detection\fasterRCNN\detection\models\detectors\faster_rcnn.py", line 148, in call detections_list = self.bbox_head.get_bboxes( File "E:\obeject_detection\fasterRCNN\detection\models\bbox_heads\bbox_head.py", line 117, in get_bboxes detections_list = [ File "E:\obeject_detection\fasterRCNN\detection\models\bbox_heads\bbox_head.py", line 118, in self._get_bboxes_single( File "E:\obeject_detection\fasterRCNN\detection\models\bbox_heads\bbox_head.py", line 187, in _get_bboxes_single nms_keep = tf.concat(nms_keep, axis=0) tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer "faster_rcnn" (type FasterRCNN).

OpKernel 'ConcatV2' has constraint on attr 'T' not in NodeDef '[N=0, Tidx=DT_INT32]', KernelDef: 'op: "ConcatV2" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_UINT64 } } } host_memory_arg: "axis"' [Op:ConcatV2] name: concat

Call arguments received: • inputs=('tf.Tensor(shape=(1, 1216, 1216, 3), dtype=float32)', 'tf.Tensor(shape=(1, 11), dtype=float32)') • training=False

yunfei1999 commented 1 year ago

when i run train_model.py