cypw / PyTorch-MFNet

MIT License
252 stars 56 forks source link

Training tricks #14

Closed Guocode closed 5 years ago

Guocode commented 5 years ago

Do you have any training tricks on ucf101,when I train MFNET on ucf101,it turns out to overfitting easily. Is a pretraining on large dataset like kinetics necessary and what's your key tricks to improve accuracy?

Esaada commented 5 years ago

From my experience you have no chance to train it straight on ucf101,HMDB51 or other small datasets, you have to pretrain on kinetics

cypw commented 5 years ago

Hi @Guocode ,

Load the kinetics pre-trained model is the best solution, since the UCF101/HMDB51 is too small for 3D ConvNets.

Below are some tricks to try if you really do not want to load Kinetics pre-trained model:

  1. Load ImageNet pre-trained model.
  2. Enable the dropout before the last classifier, or maybe add a random mask to drop some residual block randomly. (Please checkout "droppath")
  3. Augment your training data (enable more augmentation options), and maybe use the mixup technique.
  4. If the ImageNet pre-trained model is loaded, then adopt smaller learning rate to prevent forgetting the pre-trained parameters, or freeze all layers except the last classifier for the first several epochs.
  5. Maybe train less epoch to prevent overfitting. (Change the weight decay may not work.)
  6. Reduce the number of input frame. (This will reduce the clip level prediction accuracy, but will increase the video level prediction accuracy for such small dataset. Because the network will see more samples.)
  7. I am not sure this, but maybe enable BatchNorm and use small batch size to introduce more variance to prevent overfitting? (I recommend you use multi-gpu or even distributed training, with a small batch size on each GPU)
  8. If the BatchNorm is enabled, then refine the moving mean / moving var after training, by simply removing the optimizer from the train loop and train the network for several epochs. It will help the BatchNorm get more accurate moving mean / moving var which will be used during testing/evaluation.

Thanks for trying our code, and sorry for the late reply.


Deep Networks with Stochastic Depth mixup: Beyond Empirical Risk Minimization

Guocode commented 5 years ago

Thanks for reply.