deepinsight / insightface

State-of-the-art 2D and 3D Face Analysis Project
https://insightface.ai
23.38k stars 5.41k forks source link

acc is always about 0.5 using mobilenetface #189

Closed tianxingyzxq closed 6 years ago

tianxingyzxq commented 6 years ago

testing verification.. (12000, 128) infer time 30.116359 [lfw][6000]XNorm: 38.367005 [lfw][6000]Accuracy-Flip: 0.50000+-0.00000 testing verification.. (14000, 128) infer time 35.065952 [cfp_fp][6000]XNorm: 38.365932 [cfp_fp][6000]Accuracy-Flip: 0.50000+-0.00000 testing verification.. (12000, 128) infer time 30.366434 [agedb_30][6000]XNorm: 38.366582 [agedb_30][6000]Accuracy-Flip: 0.50000+-0.00000 [6000]Accuracy-Highest: 0.51533

the train script is CUDA_VISIBLE_DEVICES='0,1,2,3' python -u train_softmax.py --network y1 --loss-type 4 --margin-s 128 --margin-m 0.5 --per-batch-size 128 --emb-size 128 --data-dir ../datasets/faces_ms1m_112x112 --wd 0.00004 --fc7-wd-mult 10.0 --prefix ../model-mobilefacenet-128

nttstar commented 6 years ago

I have no GPU server to test it right now. The author told me that he did the experiments by fine-tuning(train softmax firstly and fine-tuning with arcface loss).

tianxingyzxq commented 6 years ago

To pursue ultimate performance, MobileFaceNet, MobileFaceNet (112 × 96), and MobileFaceNet (96 × 96) are further trained by ArcFace loss on the cleaned training set of MS-Celeb-1M database [5] with 3.8M images from 85K subjects. So the further train is fineturing ? not from scratch?

tianxingyzxq commented 6 years ago

I use ms1m train the mobilenetface network with softmax, the first testing verification is still 0.5 INFO:root:Epoch[0] Batch [1980] Speed: 785.83 samples/sec acc=0.004492 lr-batch-epoch: 0.1 1999 0 testing verification.. (12000, 128) infer time 5.902649 [lfw][2000]XNorm: 3.298171 [lfw][2000]Accuracy-Flip: 0.50000+-0.00000 testing verification.. (14000, 128) infer time 6.765173 [cfp_fp][2000]XNorm: 3.160100 [cfp_fp][2000]Accuracy-Flip: 0.50000+-0.00000 testing verification.. (12000, 128) infer time 5.85961 [agedb_30][2000]XNorm: 3.269653 [agedb_30][2000]Accuracy-Flip: 0.50000+-0.00000 [2000]Accuracy-Highest: 0.50000

marcosly commented 6 years ago

@tianxingyzxq I face the same problem.However,i got different after several hours. lr change to 0.001 lr-batch-epoch: 0.001 6241 18 testing verification.. (12000, 128) infer time 8.014871 [lfw][140000]XNorm: 33.146315 [lfw][140000]Accuracy-Flip: 0.98867+-0.00552 testing verification.. (14000, 128) infer time 9.476789 [cfp_fp][140000]XNorm: 29.224738 [cfp_fp][140000]Accuracy-Flip: 0.84671+-0.02232 testing verification.. (12000, 128) infer time 8.189307 [agedb_30][140000]XNorm: 33.785845 [agedb_30][140000]Accuracy-Flip: 0.88883+-0.02323

visionxyz commented 6 years ago

The same with you, always 0.5.

lmmcc commented 6 years ago

image i am also puzzled about the result...my result is like this,,,the train phase is reaching 10epcho,,,and the training acc is still 0.the lfw result is not as other researchers'. the highest is reaching to 0.79...

nttstar commented 6 years ago

I will provide a pretrained model soon.

visionxyz commented 6 years ago

@lmmcc I face the same problem when training with resnet-101. acc is low but lfw is good.

wsx276166228 commented 6 years ago

@lmmcc The same problem with you! lfw accuracy=0.991,but train accuracy=0.000000. I wonder to know whether you solver the issue?

wsx276166228 commented 6 years ago

@nttstar when I trained mobilefacenet with two 1080ti GPUs, and set batch_size to 256 per gpu. Through 20000 batchs, the lfw accuracy=0.991,but train accuracy=0.00000. I don't know if this phenomenon is normal? If it is a problem, do you know what caused this phenomenon? Thanks!

Wisgon commented 6 years ago

@wsx276166228 What's the final accuracy of lfw,cfp_fp,agedb_30 you finally got?

saxenauts commented 6 years ago

@nttstar Hi, did you get a chance to upload the pretrained model for MobileFaceNet? Similar Problem here. Thanks.

wsx276166228 commented 6 years ago

@Wisgon lfw=0.985 cfp_fp=0.854 agedb_30=0.921

pribadihcr commented 6 years ago

+1

Wisgon commented 6 years ago

I have change the learning rate from 0.1(by default) to 0.01, but the acc is still 0.And I'm still running to see the final result.

wsx276166228 commented 6 years ago

@ShiyangZhang what's you mean? what's the learing rate you set? Thanks!

tp-nan commented 6 years ago

@wsx276166228 I forgot learning rate decay according to the original paper. Now I got a much better accuracy of lfw,cfp_fp,agedb_30. But it is not good enough,maybe my batch size(364) is too small. I'll report the accuracy latter.

chenkingwen commented 6 years ago

@wsx276166228 The same problem, do you have solved this problem?

chenhuan19871014 commented 6 years ago

The same with you, always near 0.5.anyone knows how to solved it?

staceycy commented 5 years ago

Same problem. Anyone got any idea?

baishiruyue commented 5 years ago

@tianxingyzxq @nttstar i face the same question, the three valid data is always 0.5,and i check the params file of the model, i found that some layer`s paramters is near to zero,so the model can not be trained any more. how did you sovle this question?? thanks