HHTseng / video-classification

Tutorial for video classification/ action recognition using 3D CNN/ CNN+RNN on UCF101
916 stars 216 forks source link

images used in train #22

Open 123liluky opened 4 years ago

123liluky commented 4 years ago

In UCF101_ResNetCRNN.py: begin_frame, end_frame, skip_frame = 1, 29, 1 selected_frames = np.arange(begin_frame, end_frame, skip_frame).tolist() train_set, valid_set = Dataset_CRNN(data_path, train_list, train_label, selected_frames, transform=transform), Dataset_CRNN(data_path, test_list, test_label, selected_frames, transform=transform)

So, you just use the first 28 images in a video folder to train model? The left images are not used. Am I right?

HHTseng commented 4 years ago

yes, in order to have fixed size tensors as inputs for CNNs.

jaideep11061982 commented 4 years ago

hi Tseng thanks for your repo. X,y-> What is the size of the X here .. as per your conv1 structure in EncoderCNN, the no of dim of input should be 4 but you are passing 5 dim. dint understand this

for t in range(x_3d.size(1)):
            # CNNs
            x = self.conv1(x_3d[:, t, :, :, :])
            x = self.conv2(x)
            x = self.conv3(x)
            x = self.conv4(x)
            x = x.view(x.size(0), -1)