HHTseng / video-classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
916 stars 216 forks source link

Too many parameters (fc layers) in both cnn encoder and rnn decoder, causing dramatic overfitting! #51

Open mashijie1028 opened 2 years ago

mashijie1028 commented 2 years ago

There are so many fc layers in both CNN encoder and RNN decoder, only one is enough. When I implement the CRNN training, I got over 70% test acc with only one fc layer in both CNN and LSTM (However, there is still a huge overfitting). When the num_fc_layers increases, the performance degrades.

Plus, BatchNorm probably contradicts with dropout, because dropout could affect the statistics of BN, BN is already a regularizer. Maybe no dropout is better.

mashijie1028 commented 2 years ago

I was wondering how you could get 85.68% test acc in ResNet-152 + LSTM, could you please tell me the hyper-parameters? Thanks! @HHTseng

mashijie1028 commented 2 years ago

I use ResNet-18(pretrained) + LSTM and get over 80% test acc, but only 40% test acc when training ResNet-18 + LSTM from scratch. It seems that pretraining ResNet CNN encoder on ImageNet is essential.