Closed zehuichen123 closed 3 years ago
@zehuichen123 Wow,since u have 6 V100 GPUs, I think u could modify some configuration. 1. larger batch size and corresponding learning rate. 2. if batch size is large enough, u can change GN to BN or SyncBN. 3.u can test every 30 epoches to stop earlier because of sgdr scheduler.
Thanks for your advice! I'd give it a try. Besides, if I train with 6 GPUs, then the learning rate should be adjusted to 2 x 3.75e-3 , right?
@zehuichen123 The default learning rate in config files is for 3 GPUs and 4 img/gpu (batch size = 34 = 12), according to Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., lr=2 x 3.75e-3 for 6 GPUs 4 img/gpu.
lr=2 x 3.75e-3 for 6 GPUs * 4 img/gpu
epoch, easy, medium, hard, all-ap
30, 0.9281, 0.9218, 0.8903, 0.780
60, 0.9492, 0.9426, 0.9130, 0.805
90, 0.9549, 0.9485, 0.9194, 0.812
120, 0.9588, 0.9514, 0.9225, 0.817
150, 0.9609, 0.9542, 0.9258, 0.817
180, 0.9606, 0.9547, 0.9272, 0.818
210, 0.9622, 0.9561, 0.9281, 0.822
240, 0.9623, 0.9557, 0.9290, 0.823
270, 0.9608, 0.9547, 0.9288, 0.823
300, 0.9616, 0.9555, 0.9276, 0.823
330, 0.9612, 0.9553, 0.9294, 0.823
360, 0.9609, 0.9552, 0.9289, 0.824
390, 0.9637, 0.9571, 0.9294, 0.824
420, 0.9640, 0.9575, 0.9303, 0.824
450, 0.9634, 0.9575, 0.9303, 0.826
480, 0.9635, 0.9569, 0.9290, 0.823
510, 0.9633, 0.9573, 0.9303, 0.826
540, 0.9625, 0.9558, 0.9295, 0.824
570, 0.9629, 0.9557, 0.9293, 0.824
600, 0.9628, 0.9569, 0.9304, 0.826
630, 0.9619, 0.9558, 0.9287, 0.824
Performance is evaluated by: https://github.com/wondervictor/WiderFace-Evaluation
There may be variations in the above python evaluation code.
@jiankangdeng you start training from scratch or finetuning? I trained from scratch ,but the hard is 90.3 and the improvement is not obvious.
@jiankangdeng you start training from scratch or finetuning? I trained from scratch ,but the hard is 90.3 and the improvement is not obvious.
Only pretrained res50 backbone from official PyTorch. I got almost the same result with jiankangdeng.
@zehuichen123 thanks for your reply,I achieve the similar effort.But I have an another question that the loss shock violent, can you help me solve this question?
@mifan0208 This is correlated with the lr schedule. Tinaface adopts the cosine restart schedule, which leads to the increase of loss every 30 epochs.
@zehuichen123 Ok, thanks for your reply.
Hi, Thanks for this great work. However, the training time is quite long, (about 60h on 6 V100s) which is hard for us to verify other ideas on this code. Have you ever tried some shorter schedule? and how about the performance?