Open ruiming46zrm opened 5 years ago
Yes, I noticed the difference a while ago, the drop_ratio is a mistake, the choice of se_model is just random.
Please issue a PR if you can get better performace
by modified network , large batch_size = 384 (4 GPUs), lr = 0.1 , milestone = [3,6,9,12],drop ratio = 0.4 . train ir_se50, get :
lfw accrancy : 99.78%
agedb-30 : 97.58%
megaface rank 1 result : 96.4%
I think the drop ratio and batch size affect a lot
@ruiming46zrm can you share the modification you made in detail. Especially the network.
@bnulihaixia as above : 1. shortcut; 2:add bn .
you may try to use large batch and large lr to get better result
@ruiming46zrm thanks for your response.
Hi, Do we need to normalize the images of Facescrub and Megaface before feeding them to the model to get features? Thank you for reading.
Did you do it on purpose? Why did you modify it like this? Or was it unintentional? @TreB1eN
Hi, @TreB1eN , I'm learning your codes for few days ,it's very nice work, but compared the network with mxnet version: 2 key differences found as following: 1 . in bottleneck, res layer, there might be a bn between first conv and prelu as mxnet 2 . your method of short_cut :
lead the first get_block, first unit's short_cut become MaxPool2d(1,2) but not Conv+BN because of the input data channel=64=depth, that's diffferent with mxnet codes. maybe use :
I don't know whether it would be the reason of low megaface results, and other differences like : mxnet seemed no use se model to train res50, why do we use? and the drop_ratio = 0.4 in mxnet but 0.6 in torch.
do you have any suggestions?
I will change the above and train