Closed ZhangYuef closed 5 years ago
Hi @ZhangYuef , thanks for your attention.
As you can see, in fact the pretraining is simply using a softmax classification loss, with every identity being a unique class. I didn't pay much attention on it or tune it, so I didn't remember the exact values of the hyperparameters. But it should be somewhere around:
epoch: 40 batchsize: 64 (not sure) lr: 1e-2 wd: 1e-2
You may tune a bit and obtain some reasonable results.
Hi @KovenYu ,thanks for your sharing. According to your settings, I reproduced the results of your pretrained model. but I found that I can't get the results in the paper when I train this pretrained model in the second-stage training. I think the parameter distribution of the pretrained model is very important for the parameter setting of the second-stage training. can you share the pretrained model code? Thank you very much.
hi @moodom thank you for your attention. Did you try using the provided pretrained model and is that working?
HI,@KovenYu. I had used the provided pretrained model and got a good result. But when I used the LAL loss as described in the paper and remove the unit norm constraint to train a pretrained model. After that, I used the pretrained model in the second stage of training and the rank 1 can only reach about 56. I tried to adjust LR and WD. the results were the same. I tested the average parameters of the provided pretrained model in the FC layer and the Euclidean distance between the FC layer column vector. The results are as follows: Average parameters of FC layer: - 0.00755771 Column Vector Euclidean Distance Mean: - 413379.0 Standard deviation of column vector Euclidean distance: 1.8415+08 I think it's a very good result. The parameters are very small, but the distance is very large. But the pretrained model I trained did not reach that level. Do you use any other training skills?
@moodom thank you for your detailed description! I looked at the pretrained code and I find two notable points:
a
and f(x)
are not normalized, and the scale factor 30 is also not used.Thank you for sharing the code. I set the corresponding parameters according to your description and want to re-loss_al pre-training. However, the pre-training weights obtained in the second stage of training appeared a large number of nan cases. The following is my pre-training code: https://github.com/pzhren/Papers/blob/master/%E7%9B%AE%E6%A0%87%E6%A3%80%E6%B5%8B%E4%B8%8Ere-id%E4%BB%BB%E5%8A%A1/MAR-master/src/pretrain.py#L6
The following are the hyperparameter settings during pre-training. python version : 3.5.4 |Continuum Analytics, Inc.| (default, Aug 14 2017, 13:26:58) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] torch version : 1.1.0
do not use pre-trained model. train from scratch. loaded pre-trained model from ../data/resnet50-19c8e357.pth
==>>[2020-03-20 18:12:12] [Epoch=000/060] Stage 1, [Need: 00:00:00] Iter: [000/969] Freq 37.5 loss_total 8.316 loss_source 8.316
Thanks for your sharing.
I find that the paper mentions that the model is pretrained only used $$L_{AL}$$first. In section 4.2,
And I don't know how to pretrain the model according to the code now. I need some more detailed instructions, e.g how many epoches should I pretrain the model.
Thanks >.<