Visual-Behavior / detr-tensorflow

Tensorflow implementation of DETR : Object Detection with Transformers
MIT License
169 stars 53 forks source link

Paddings must be non-negative: 0 -7 [[{{node Pad_2}}]] [Op:IteratorGetNext] #28

Open simpad2409 opened 3 years ago

simpad2409 commented 3 years ago

Hello everyone, I tried to run the finetuning of DETR with the dataset indicated in the tutorial ("hardhat") and it seems to work fine. Now I have replaced the initial dataset with one of my interest, namely "Crowd Human". Suddenly, however, I get this error (sometimes it even manages to get to validation step 3 or 4):

`Load weights from weights/detr/detr.ckpt Model: "detr_finetuning"


Layer (type) Output Shape Param # Connected to

input_2 (InputLayer) [(None, None, None, 0


detr (Functional) (6, None, 100, 256) 41449152 input_2[0][0]


pos_layer (Sequential) (6, None, 100, 4) 132612 detr[0][0]


cls_layer (Dense) (6, None, 100, 2) 514 detr[0][0]


tf_op_layer_strided_slice_6 (Te [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_5 (Te [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_8 (Te [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_7 (Te [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_10 (T [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_9 (Te [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_12 (T [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_11 (T [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_14 (T [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_13 (T [(None, 100, 2)] 0 cls_layer[0][0]


tf_op_layer_strided_slice_4 (Te [(None, 100, 4)] 0 pos_layer[0][0]


tf_op_layer_strided_slice_3 (Te [(None, 100, 2)] 0 cls_layer[0][0]

Total params: 41,582,278 Trainable params: 41,476,038 Non-trainable params: 106,240


/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/loss/compute_map.py:101: RuntimeWarning: invalid value encountered in true_divide overlaps = intersections / union Validation step: [0], ce: [0.78] giou : [1.13] l1 : [0.93] time : [0.00] Traceback (most recent call last): File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2102, in execution_mode yield File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 758, in _next_internal output_shapes=self._flat_output_shapes) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2610, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -7 [[{{node Pad_2}}]] [Op:IteratorGetNext]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "finetune_crowd_human.py", line 99, in run_finetuning(config) File "finetune_crowd_human.py", line 79, in run_finetuning training.eval(detr, valid_dt, config, class_names, evaluation_step=100) File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/training.py", line 72, in eval for val_step, (images, t_bbox, t_class) in enumerate(valid_dt): File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 736, in next return self.next() File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 772, in next return self._next_internal() File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 764, in _next_internal return structure.from_compatible_tensor_list(self._element_spec, ret) File "/usr/lib/python3.6/contextlib.py", line 99, in exit self.gen.throw(type, value, traceback) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2105, in execution_mode executor_new.wait() File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/executor.py", line 67, in wait pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle) tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -7 [[{{node Pad_2}}]] Traceback (most recent call last): File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2102, in execution_mode yield File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 758, in _next_internal output_shapes=self._flat_output_shapes) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2610, in iterator_get_next _ops.raise_from_not_ok_status(e, name) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6843, in raise_from_not_ok_status six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -7 [[{{node Pad_2}}]] [Op:IteratorGetNext]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "finetune_crowd_human.py", line 99, in run_finetuning(config) File "finetune_crowd_human.py", line 79, in run_finetuning training.eval(detr, valid_dt, config, class_names, evaluation_step=100) File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/training.py", line 72, in eval for val_step, (images, t_bbox, t_class) in enumerate(valid_dt): File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 736, in next return self.next() File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 772, in next return self._next_internal() File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 764, in _next_internal return structure.from_compatible_tensor_list(self._element_spec, ret) File "/usr/lib/python3.6/contextlib.py", line 99, in exit self.gen.throw(type, value, traceback) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/context.py", line 2105, in execution_mode executor_new.wait() File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/eager/executor.py", line 67, in wait pywrap_tfe.TFE_ExecutorWaitForAllPendingNodes(self._handle) tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -7 [[{{node Pad_2}}]] `

What does it mean? and how can I solve the problem?

Thank you so much in advance. Greetings, Simone.

PhanTask commented 3 years ago

This normally means some samples from your own dataset have more than 100 objects (in this case, 107. That's why it tells you -7 because you are trying to pad the objects from 107 back to 100).

One solution would be increasing both the number of object queries and the padding number, which are defined in DETR class and pad_labels function, respectively.

simpad2409 commented 3 years ago

@PhanTask Thanks so much for the answer... Yes, apparently the problem seems to be just that! I tried to change the parameters you said, but I get this error:

Load weights from weights/detr/detr.ckpt Traceback (most recent call last): File "finetune_crowd_human.py", line 99, in run_finetuning(config) File "finetune_crowd_human.py", line 49, in run_finetuning detr = build_model(config) File "finetune_crowd_human.py", line 41, in build_model detr = get_detr_model(config, include_top=False, nb_class=2, weights="detr", num_decoder_layers=6, num_encoder_layers=6) File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/networks/detr.py", line 175, in get_detr_model hs = transformer(input_proj(x), masks, query_embed(None), pos_encoding)[0] File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 982, in call self._maybe_build(inputs) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2643, in _maybe_build self.build(input_shapes) # pylint:disable=not-callable File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/networks/custom_layers.py", line 64, in build initializer=tf.keras.initializers.GlorotUniform(), trainable=True) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 614, in add_weight caching_device=caching_device) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 724, in _add_variable_with_custom_getter name=name, shape=shape) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 791, in _preload_simple_restoration checkpoint_position=checkpoint_position, shape=shape) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 75, in init self.wrapped_value.set_shape(shape) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1209, in set_shape (self.shape, shape)) ValueError: Tensor's shape (100, 256) is not compatible with supplied shape (500, 256) Traceback (most recent call last): File "finetune_crowd_human.py", line 99, in run_finetuning(config) File "finetune_crowd_human.py", line 49, in run_finetuning detr = build_model(config) File "finetune_crowd_human.py", line 41, in build_model detr = get_detr_model(config, include_top=False, nb_class=2, weights="detr", num_decoder_layers=6, num_encoder_layers=6) File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/networks/detr.py", line 175, in get_detr_model hs = transformer(input_proj(x), masks, query_embed(None), pos_encoding)[0] File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 982, in call self._maybe_build(inputs) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2643, in _maybe_build self.build(input_shapes) # pylint:disable=not-callable File "/user/spadula/DETECTOR/detr-tensorflow-main/detr_tf/networks/custom_layers.py", line 64, in build initializer=tf.keras.initializers.GlorotUniform(), trainable=True) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 614, in add_weight caching_device=caching_device) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 724, in _add_variable_with_custom_getter name=name, shape=shape) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 791, in _preload_simple_restoration checkpoint_position=checkpoint_position, shape=shape) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 75, in init self.wrapped_value.set_shape(shape) File "/user/spadula/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1209, in set_shape (self.shape, shape)) ValueError: Tensor's shape (100, 256) is not compatible with supplied shape (500, 256)

It seems we need to change something else too...

I take this opportunity to ask you another question, given your gentleness: once finetuning is done in this way, where can I find the "retrained" model, so that I can use it?

Thanks in advance for your availability!

PhanTask commented 3 years ago

@simpad2409 I saw you changed the object query number from 100 to 500. Note that if you do so, the structure of the transformer changed and you cannot load the pretrained DETR weight anymore (trained for 100 queries, thereby causing shape mismatch issue), which means you cannot use the provided DETR weight for fine-tuning directly. This means, in the code detr = get_detr_model(config, include_top=False, nb_class=2, weights="detr", num_decoder_layers=6, num_encoder_layers=6), weights="detr" should be weights=None.

However, you may still preload weights for some specific layers that are not affected by object query numbers such as the ResNet backbone part. You may extract this part of the weight and preload it for your backbone.

For saving and loading funetune model, you may use model.save_weights() and model.load_weights() functions.

thibo73800 commented 3 years ago

If you really want to be able to handle more than 100 objects the best move might be to extend the num_queries while keeping the weights of the first 100 pretrained queries.

It might be a good ideas to look at the distribution of the number of obejcts per image for the Crowd Human dataset, to decide on the right things to do.

simpad2409 commented 3 years ago

@PhanTask @thibo73800 Thanks to both of you for answering me. You are really kind! Below is a graph showing the number of people for each image (and therefore gt) of the CrowdHuman dataset. What do you think? Is it possible to increase the number of queries? If so, how?

Thanks again to both of you!

Schermata 2021-07-05 alle 12 28 25
thibo73800 commented 3 years ago

Based on this histogram, the mean might rouhgly around 30, while the median should be around 20. So the question is, do you want your model to be able to predict more than 100 people (even if this is really not often the case) ? Or do you want your model to always detect the 25/50th most obvious people ?

simpad2409 commented 3 years ago

@thibo73800 I would like to create a detector that is able to detect as many people as possible (correctly) within a scene, in order to estimate the distance between them and to create a system for social distancing on the subject of Covid-19. I don't know if I got the idea ...

PhanTask commented 3 years ago

@simpad2409 My feeling is that the detection results with more than 100 people in one view, considering the image resolution, are not very reliable. Even COCO metrics only calculate the accuracy of the top 100 detection results. I would suggest you either skip these samples with more than 100 people (treated as outliers) or only keep the top 100 most visually salient objects in these samples. A third option would be patching your data into smaller patches that contain fewer objects if you really want to detect more than 100 people in one sample.

simpad2409 commented 3 years ago

@PhanTask @thibo73800 Thanks for the reply. :) I followed the advice to filter images with at most 100 people inside. THANK YOU! I have another question ... I started finetuning on my CrowdHuman dataset to improve detection on people in DETR. I have a validation set of about 3000 images; should I pass it all on? I do like this:

validdt, = load_tfcsv_dataset( config, 3000, augmentation=False, ann_file="test/_annotations.csv", img_dir="test")

(I noticed, however, that at each validation ALL 3000 images are viewed from the network... and not just a few at a time, different for each validation at the end of each epoch).

Am I doing something wrong? What can you advise me to do a "good" finetuning for the detection of people? Thank you so much.