Problem with ResnetCRNN_varlen

Hello @HHTseng,

I find it really interesting in your code for better understanding both 3D and CNN + LSTM architecture. However, I think there is a small problem when you handling various lengths in the LSTM part. As we have some videos with minimum of 28 frames and you have padded it to make sure they are all have 50 frames. However, when you decode the LSTM hidden units, you take the last frame: https://github.com/HHTseng/video-classification/blob/master/ResNetCRNN_varylength/functions.py#L276 which will be zeros in these cases.

I think we have to rely on the second output of torch.nn.utils.rnn.pad_packed_sequence to decide which timestep to decode for classification,

Please let me know your opinion,

HHTseng / video-classification

Problem with ResnetCRNN_varlen #39