HHTseng / video-classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
936 stars 216 forks source link

CRNN #8

Open raghavgarg97-zz opened 5 years ago

raghavgarg97-zz commented 5 years ago

Could you please add your predictions of CRNN that correspond to the Results in your Github repo(similiar to predictions added for Conv3D and ResNetCRNN).

HHTseng commented 5 years ago

Hi, I posted my pretrained CRNN model here with about 41% accuracy on testing data at epoch 36. Please note that you need to reload this pretrained weights with only one GPU. (since it was trained on single GPU). Please use the following command to block other CUDA devices:

import os
os.environ["CUDA_DEVICE_ORDER"]= "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]= "0"

I realize the default parameters given on the repo are not good ones for CRNN (they are good for ResNetCRNN). I guess you may find the training accuracy is always low with the default. The problem seems to be too many weights on decoderRNN. At least I found the following work better:

RNN_hidden_layers = 1    # in UCF_CRNN.py
self.ch1, self.ch2, self.ch3, self.ch4 = 8, 16, 32, 64   # in class EncoderCNN of functions.py

It would take me some time to refind optimal parameters for best accuracy in CRNN, I would change my repo later. Thank you for letting me know some problems, please keep me updated!

Best, HTseng

raghavgarg97-zz commented 5 years ago

I trained my model with the parameters you gave with decreased batch size on Multiple GPUs. I was able to achieve test accuracy of 64.17% with best epoch 86. Overall Training accuracy- 89.75% And by when could you update the Variable length sequence code?

HHTseng commented 5 years ago

Good to know that the parameter works! thanks for the update. Sure, I actually have the code of variable length, just need to organize better and test it. Will let you know later!

raghavgarg97-zz commented 5 years ago

Btw I was also trying out variable length sequence code.How do you actually plan to form a batch with variable length sequences ? or Are you going to train it with Batch Size 1 only? Or are you going to divide Videos into clips as most papers do?