AmigoCDT / MXNet-MobileNetV3

A Gluon implement of MobileNetV3
Apache License 2.0
28 stars 10 forks source link

Did you modify the file when you executed the MobileNetV3? #3

Open pywin opened 5 years ago

pywin commented 5 years ago

Did you modify the fc_type of get_fc1 (last_conv, num_classes, fc_type, input_channel=512) in deepinsight/recognition/symbol/symbol_utils.py when you executed the deepinsight/recognition/train.py for MobileNetV3?

AmigoCDT commented 5 years ago

No, I did not change the fc_type in get_fc1, it is default as "E". If it cannot work in insightface/recognition, you could try to cp to src/symbol, and run src/train_softmax.py.

AmigoCDT commented 5 years ago

I found my insightface is not newest, so some bugs in this implement. I think I need to add the symbol_utils/get_fc1 in get_symbol to run the recognition/train.py. but it is not need in src/train_softmax.py thank you.

pywin commented 5 years ago

thank you very much! I run src/train_softmax.py, but the get the very low acc with fc_type='E', acc = 0.000078 or 0.000156 in 2000batch, I'm sure my data is OK because I run insightface/recognition/train.py for mobilenetv1 without error and acc improves quickly. Do you know where the problem lies? thank you~

AmigoCDT commented 5 years ago

Does the accuracy of lfw, cfp_fp, agedb_30 increase? If they increased, it is normal. for the arcface, the net need to learn at least 100k images to get train_acc > 0. and using mobilenet as backbone, we first should use loss_type=0 to train, the loss_type=4 is too hard to convergent. Your 2000 batch means 2000 iterations, i.e. 2000 per_batch_size gpu_num images?

AmigoCDT commented 5 years ago

In my training ArcFace with loss_type=0 and mobilenet_v3. At 8th epoch, my training accuracy is about 0.55. At the end of 0th epoch, the training accuracy is at 0.019.

Following is my argument or setting in training mobilenet-v3 ArcFace

Called with argument: Namespace(batch_size=256, beta=1000.0, beta_freeze=0, beta_min=5.0, 
bn_mom=0.9, ce_loss=False, ckpt=2, color=0, ctx_num=1, cutoff=0, data_dir='G:/datasets/faces_emore', 
easy_margin=0, emb_size=512, end_epoch=100000, fc7_lr_mult=1.0, fc7_no_bias=False, 
fc7_wd_mult=10.0, gamma=0.12, image_channel=3, image_h=112, image_size='112,112', image_w=112, 
images_filter=0, loss_type=0, lr=0.1, lr_steps='240000,360000,440000', margin=4, margin_a=1.0, 
margin_b=0.0, margin_m=0.5, margin_s=64.0, max_steps=0, mom=0.9, network='t3', 
num_classes=85742, num_layers=3, per_batch_size=256, power=1.0, prefix='D:/insightface/model-m3-
0/model', pretrained='', rand_mirror=1, rescale_threshold=0, scale=0.9993, 
target='cfp_ff,cfp_fp,agedb_30', use_deformable=0, verbose=2000, version_act='prelu', version_input=1, 
version_multiplier=1.0, version_output='E', version_se=0, version_unit=3, wd=4e-05)

INFO:root:loading recordio G:/datasets/faces_emore\train.rec...
pywin commented 5 years ago

Thank you for paying so much attention to my answer. I feel that the problem is that the 276-280 line in mobilenetv3.py should be commented out, modify return self._layers(x) to return self.layers(x), and add symbol_utils.get_fc1(body, num_classes, 'E') in get_symbole(). I use /deepinsight/insightface/recognition/train.py to do this, although the effect is better than before, the convergence rate is still relatively slow. Didn't you modify mobilenetv3.py and put it directly in /insightface/recognition/symbol?

AmigoCDT commented 5 years ago

This version is same as my using. not modified. In face, the self._layers is right, because the self.layers is just a list, not Gluon.nn.HybridSequential(). In my training, the mobilenet-v3 ArcFace is worse than y1 in insightface. I dont know why. I compare my implement with other's, also no difference. 😭

AmigoCDT commented 5 years ago

I would do a training with small mode Mobilenet-v3 in these days, to confirm if the result was really bad.

pywin commented 5 years ago

It's a meaningful job. There is no problem with the convergence speed and accuracy this time, but a new problem has arisen. The preserved models more than 100M .
if self.mode == 'large': self.last.add(nn.Conv2D(960, kernel_size=1, strides=1, use_bias=False)) self.last.add(nn.BatchNorm()) self.last.add(HardSwish()) I remove the GlobalAvgPool2d() and conv2D 1x1. Maybe the model is too large because of the removal of GlobalAvgPool2d().

pywin commented 5 years ago

I ran your code these days and the generated model is 107M. Shouldn't the size of mobilenetv3 be about 4M? what size is your model? Thank you~

AmigoCDT commented 5 years ago

Your model is 107M? It is weird. Maybe you should use insightface/deploy/model_slim.py to cutoff the last FC layer, just keep the backbone net. The model size is also a problem, That is what puzzling me! As the following pic shows: 640 the model size of V3-Small 1.0 should be 2.9M*4=11.6M, but mine is about 14.5M. I am comparing my implement with paper.

clhne commented 4 years ago

How you write the function get_symbol in train_softmax.py when use the args.network[0]=mobilenetv3 ? (Just for you network = t3) Thanks.