MCG-NJU / VideoMAE

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
https://arxiv.org/abs/2203.12602
Other
1.39k stars 137 forks source link

the acc of small batch datasets is too low #84

Closed binbinjiang0505 closed 1 year ago

binbinjiang0505 commented 1 year ago

As you said in your article,“ We demonstrate that VideoMAE is a data-efficient learner that could be successfully trained with only 3.5k videos.” I use 8 GPUs, batch_size is 8 ,17400 video datasets,and the training is 800epochs during training, but the accuracy is very low, even less than 1% 。 Should I adjust my learning rate?Looking forward to your reply

yztongzhan commented 1 year ago

Hi @binbinjiang0505 ! Please refer to our scripts of UCF-101.

yztongzhan commented 1 year ago

Hi @binbinjiang0505 ! Any update?

binbinjiang0505 commented 1 year ago

Hi @binbinjiang0505 ! Any update?

I tested various issues, and during the training process, there was a significant decrease in train_loss. At first, I thought it was overfitting, so I used the training set to test, but the accuracy was still around 0.5%, and 1/174 ≈ 0.5%. Therefore, I currently believe that there was a problem when saving the model. The size of my saved models for every 20 epochs was 1.05g, resulting in each saved model being maybe a model that I hadn't learned initially., But I haven't modified any code, so I'm very confused now. (The dataset I used is SSV2, with 100 videos selected for each category constituting 17400 video datasets)

sulizhi commented 1 year ago

Hi?Have you solved your problem? Cause I met the same question. Looking forward to your reply!