DavideA / c3d-pytorch

Pytorch porting of C3D network, with Sports1M weights
MIT License
342 stars 80 forks source link

What's the input tensor size of this C3D net.. #10

Closed loveritsu929 closed 6 years ago

loveritsu929 commented 6 years ago

Hi, thank you for the implementation of the C3D net. I'm currently trying to train the model on the ucf dataset. As they described in the paper, I choose several (10/12/14/16) frames from a clip, unsqueeze each on dimension 1, then concatenate all of them. So the input tensor has a torch.Size([3, 10/12/14/16, 224, 224]).

Then I got a error: '' File "/home/cxing95/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 371, in max_pool3d ret = torch._C._nn.max_pool3d(input, kernel_size, stride, padding, dilation, ceil_mode)

RuntimeError: Given input size: (512x1x14x14). Calculated output size: (512x0x8x8). Output size is too small at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THCUNN/generic/VolumetricDilatedMaxPooling.cu:105 '' I'm not sure if it raised because of the input size. But could you tell what's the input should be like to train or use this net. That will help a lot.

Thank you.

DavideA commented 6 years ago

Hi and thanks for interest. In the trained version I provide, input shape is (3,16,112,112).

That seems indeed an issue related to input dimensions. I think 3d pooling layers are squeezing out the temporal dimension entirely, since you use less frames for each clip (the architecture is tailored for 16 frames).

Please also note that if you change each frame resolution you will need to set the number of input features of the first fully connected layer accordingly.

Hope this helps, D