Closed TIAN-Xiao closed 5 years ago
Network is easy to shake due to small batch size.
Network is easy to shake due to small batch size.
So, if I wanna reproduce the same AP, I have to have GPUs with large memories so that big batch_size can be used. is it right?
Yes......
@TIAN-Xiao did you try to use group norm instead batchnorm or freezing batchnorm after pretrain on ImageNet? It's common practice
@biubug6 Could you share the traning log of mobilenet0.25? When i use this repo to reproduce the result, the batch_size in my config is 512, but i get a lower result: Easy Val AP: 0.74 Medium Val AP: 0.60 Hard Val AP: 0.29
Original batch_size is 32. You increased it 16 times but number of iteration was decreased 16 times. It's bad. If you increase batch size 16 times you should increase learning rate 16 times for fix this problem.
@rydenisbak yeah, i also increased the learning rate to 0.015(~16 times), but the performance is bad.
@biubug6
Why don't you use multiscale training?
@twmht I randomly crop image(0.3~1) and scale image to the fixed scale which has the similar effect with multiscale training.
@felixfuu anyway final learning rate maybe too high, try cosine lr schedule https://pytorch.org/docs/stable/optim.html#torch.optim.lr_scheduler.CosineAnnealingLR
@biubug6 After training for 50 times, the learning rate remains the same. What's the matter?
@twmht我随机裁剪图像(0.3~1)并将图像缩放到与多尺度训练具有相似效果的固定尺度。
The wider validation set enables the input to be fixed with the same shape of the input and the mAP is not low. Because the wider validation set of true labels makes pictures of different sizes
The batch_size is important, and it will influence the learning rate. when you increase batch_size, lr should also increase, reverse is same.
I also could not reproduce the results of it. Doing single scale testing I could achieve only
Easy: 91.99%
Medium: 89.78%
Hard: 61.22%
my model does not perform well on hard samples.
edited: I reproduced all the results and added additional backbones (mbv2, mbv1, resnet18, 34, 50)
Hi, I use your codes to train a retinaface detector using the Resnet50 pre-trained model and the widerface data you provide. I didn't change any configuration parameter in config.py except batch_size(from original 24 to 4). However, I cannot get the same AP as yours. By single scale testing, the result is as follows: Easy Val AP 0.918, Medium Val AP 0.874, Hard Val AP 0.620. I want to know how to get the same AP and why there is a big gap between my result and yours. Thanks a lot.