can't reproduce the same AP

biubug6 / Pytorch_Retinaface

Retinaface get 80.99% in widerface hard val using mobilenet0.25.

MIT License

2.63k stars 774 forks source link

can't reproduce the same AP #30

Closed TIAN-Xiao closed 5 years ago

TIAN-Xiao commented 5 years ago

Hi, I use your codes to train a retinaface detector using the Resnet50 pre-trained model and the widerface data you provide. I didn't change any configuration parameter in config.py except batch_size(from original 24 to 4). However, I cannot get the same AP as yours. By single scale testing, the result is as follows: Easy Val AP 0.918, Medium Val AP 0.874, Hard Val AP 0.620. I want to know how to get the same AP and why there is a big gap between my result and yours. Thanks a lot.

biubug6 commented 5 years ago

Network is easy to shake due to small batch size.

TIAN-Xiao commented 5 years ago

Network is easy to shake due to small batch size.

So, if I wanna reproduce the same AP, I have to have GPUs with large memories so that big batch_size can be used. is it right?

biubug6 commented 5 years ago

Yes......

rydenisbak commented 5 years ago

@TIAN-Xiao did you try to use group norm instead batchnorm or freezing batchnorm after pretrain on ImageNet? It's common practice

felixfuu commented 5 years ago

@biubug6 Could you share the traning log of mobilenet0.25? When i use this repo to reproduce the result, the batch_size in my config is 512, but i get a lower result: Easy Val AP: 0.74 Medium Val AP: 0.60 Hard Val AP: 0.29

rydenisbak commented 5 years ago

Original batch_size is 32. You increased it 16 times but number of iteration was decreased 16 times. It's bad. If you increase batch size 16 times you should increase learning rate 16 times for fix this problem.

felixfuu commented 5 years ago

@rydenisbak yeah, i also increased the learning rate to 0.015(~16 times), but the performance is bad.

twmht commented 5 years ago

@biubug6

Why don't you use multiscale training?

biubug6 commented 5 years ago

@twmht I randomly crop image(0.3~1) and scale image to the fixed scale which has the similar effect with multiscale training.

rydenisbak commented 5 years ago

@felixfuu anyway final learning rate maybe too high, try cosine lr schedule https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.CosineAnnealingLR

watertianyi commented 3 years ago

@biubug6 After training for 50 times, the learning rate remains the same. What's the matter?

maidouxiaozi commented 2 years ago

@twmht我随机裁剪图像（0.3~1）并将图像缩放到与多尺度训练具有相似效果的固定尺度。

The wider validation set enables the input to be fixed with the same shape of the input and the mAP is not low. Because the wider validation set of true labels makes pictures of different sizes

crj1998 commented 1 year ago

The batch_size is important, and it will influence the learning rate. when you increase batch_size, lr should also increase, reverse is same.

yakhyo commented 2 weeks ago

I also could not reproduce the results of it. Doing single scale testing I could achieve only

Easy: 91.99% 
Medium: 89.78%
Hard: 61.22%

my model does not perform well on hard samples.

edited: I reproduced all the results and added additional backbones (mbv2, mbv1, resnet18, 34, 50)