SHI-Labs / Decoupled-Classification-Refinement

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN (ECCV 2018)
https://arxiv.org/abs/1803.06799
MIT License
167 stars 24 forks source link

Can you give a detailed explaination about DCR-v1 ? Especially the two .py codes in ./dcr_v1/. #3

Open KimSoybean opened 6 years ago

bowenc0221 commented 6 years ago

DCR-v1 is a stand-alone classification network aiming to suppress (hard) false positives in object detection. You can think DCR-v1 as a classifier and we use ResNet-152 in our paper. The input to DCR-v1 is a batch of images with size 3x224x224. Each image is a cropped proposal of base detector's output.

./dcr_v1/train_rcnn.py is used to train DCR-v1.
./dcr_v1/rcnn_rescore_combined_fast.py is used to combine DCR-v1 results with base detector's classification results (by simply multiplying two scores).

0xTechSavvy commented 6 years ago

Hi @bowenc0221 ,

Can you please give me some ideas about how to process the different size ROI? Thz

KimSoybean commented 6 years ago

Thank you! I want to apply DCR-v1 on one-stage detector. Do you have some ideas about that?

bowenc0221 commented 6 years ago

Hi @hongdayu ,

ROIs are first cropped on the original image and resized to 224x224. You may find this code helpful.

bowenc0221 commented 6 years ago

Hi @KimSoybean ,

Since one-stage detectors produces much more boxes than two-stage detectors (typically 300 for Faster RCNN), you may need decide a trade-off on how many boxes you want to process and how to select these boxes.

KimSoybean commented 6 years ago

Thank you!@bowenc0221

0xTechSavvy commented 6 years ago

Hi @hongdayu ,

ROIs are first cropped on the original image and resized to 224x224. You may find this code helpful.

Thank you !!!

KimSoybean commented 6 years ago

In sampling strategies, some pad_indexes are randomly sampled in all boxes or null ( pad_indexes=[] ). This causes there are same samples in positive and negative samples. Do i misunderstand that?

bowenc0221 commented 6 years ago

@KimSoybean The padding is needed to form a fixed-size batch. For example, if batchsize=32 but we only sampled 30 boxes, then we just pad it to 32 boxes and assign label -1 (ignored during training) for the padded boxes.

KimSoybean commented 6 years ago

I see. Thank you. @bowenc0221

KimSoybean commented 6 years ago

Why you multiply two scores? This makes the original score become lower or do I misunderstand that?

bowenc0221 commented 6 years ago

@KimSoybean

The reason is that the new classifier (DCR) is trained without location information. If we only use the scores from DCR, we observe very poor performance (DCR becomes RCNN in this case, due to different sampling strategy, it performs worse than RCNN when used along).

Yes, multiplying two scores will make the final score become lower. However, it is a reranking process. DCR decreases scores of False Positives (FPs) by a larger amount than True Positives (TPs). This makes the relative ranking changes and more FPs are suppressed by a predefined threshold. In the final evaluation process, only the relative ranking matters.

KimSoybean commented 6 years ago

Thanks! @bowenc0221. Have you trained DCR for more than 9 epochs ? Will the over-fitting happen? I want to know more details because I'm using DCR for one-stage face detection.

bowenc0221 commented 6 years ago

@KimSoybean I haven't trained it for more epochs. I simply trained the same number of epochs as used for training Faster RCNN and it might not be the optimal setting. However, I think overfitting might happen if you do not use any data augmentation. (one-stage detectors are trained for longer epochs because they have strong data augmentation)

emergencyd commented 5 years ago

Hi, I got another question for u~? how do you assign labels for the images handing to stage2 DCR model? (v1) is that exactly same as the standard of stage1?