AlbertoSabater / Robust-and-efficient-post-processing-for-video-object-detection

GNU General Public License v3.0
146 stars 20 forks source link

NMS implementation #12

Closed ZiyanZhu1994 closed 3 years ago

ZiyanZhu1994 commented 3 years ago

Hi, I am trying to apply REPP on custom videos. I noticed there are some overlapping bboxes for same object. I am wondering if I should apply NMS on the outputs of REPP or there is an NMS module in REPP?

Thanks in advance!

AlbertoSabater commented 3 years ago

Hi, NMS should be applied before REPP. This is the sample of YOLOv3 in the paper. If you still find some overlapping you can modify the _iouthr param of NMS.

Another cause of overlapping might be on the prediction extraction from tubelets. There, a different prediction is generated for each class with a score above _min_predscore, but the bbox coordinates will be the same. You can try to augment this threshold or output just the detection generated for the most confident class.

ZiyanZhu1994 commented 3 years ago

Thank you for your reply! I will modify the iou_thr of NMS.

I have another question that I am trying to finetune the embedding model. I am using another detector backbone instead of Yolo. From your paper, I learned that the embedding model are trained with triplet dataset. I am trying to find the training script for the embedding model, but only found the training script for logistic regression. Could you please point out the training script for the embedding model?

Thank you!

AlbertoSabater commented 3 years ago

Unfortunately, I no longer have that code. However, the original training is very simple. The results reported in the article use the same triplets generated for the logistic regression as the dataset. Then for each training iteration, you sample a set of triplets (batch), augment the images and calculate the transformed patch coordinates for the objects, then extract the patches from the backbone and process them with the embedding model [RoIPooling, dense layer, normalization]. During training, I used the triplet loss.

Some improvements over this training (not implemented in the article) would be to change the triplet loss by a semi-hard triplet loss or the recent Nt-Xent loss. In these cases, the triplets generated for the Logistic Regression would not longer be needed.