MVIG-SJTU / AlphAction

Spatio-Temporal Action Localization System
407 stars 74 forks source link

Questions about the training schedule #31

Closed xyang35 closed 4 years ago

xyang35 commented 4 years ago

Thanks for sharing the great work! I'm trying to reproduce the baseline result (ResNet-50) in Pytorch following the schedule you provided. However, I only get 21.4% mAP, much lower than 26.5% reported in the paper. I have a few questions as follows.

1) As the training schedule is reported as "iterations" in paper and the codebase, do you have any idea how many "epochs" it is roughly equivalent to? I used 10 epochs in my experiments.

2) The learning rate used in this paper (0.004 for clip_size 64) is quite small compared with other papers (e.g., in LFB, 0.04 for clip_size 16). It seems the model is not sufficiently training using this small learning rate after 10 epochs. I'm wondering whether I've misunderstood something here. I tried using base_lr=0.008 and got 23.2% mAP.

Again, thanks for your work and it'll be great if you could help me with this problem. My training schedule is summarized here: (Max Epochs: 10, Base_lr: 0.004, Batch_size: 64, Lr_decay: at 6 / 8 epochs)

yelantf commented 4 years ago

The released code has some slight difference compared to those hyper-parameters provided paper (we change the weight of the loss when we release the code). Currently, you could simply use the provided config file. If you want to use larger batch size, just adjust the learning rate following linear scaling rule and also scale the scheduler to train for the same number of epochs.

xyang35 commented 4 years ago

Thanks for your reply! As I'm reproducing the results in Pytorch, I cannot directly use the config file provided for training. One question is how many epochs I should train for? I'm not sure how to convert the number of iterations in your config file to the number of epochs (one epoch refers to one scan of the whole dataset). Thanks.

yelantf commented 4 years ago

The training set includes about 200,000 video clips, so for the iteration number in the config file, epoch ~= 16*iteration_num/200000. I recommend you to reproduce the results following this codebase, since not all details are clearly clarified in the paper (e.g, the focal loss, the weight of each loss).

xyang35 commented 4 years ago

Thanks for your reply!

Chuckie-He commented 3 years ago

The training set includes about 200,000 video clips, so for the iteration number in the config file, epoch ~= 16*iteration_num/200000. I recommend you to reproduce the results following this codebase, since not all details are clearly clarified in the paper (e.g, the focal loss, the weight of each loss).

hi, I want to know that when evaluating the loss, why the weight of pose loss is 1.2 rather than 14.0?(the weight of object loss is 49.0 and the weight of person loss is 17.0). Thank you!