Closed edubois closed 6 years ago
Try this: model.MaxPool([input],[output], kernels=[3,3,3], pads=[1,1,1]*2, strides=[2,2,2])
For ND operators, caffe2 expects plural forms of argument names (i.e. kernels in place of kernel and so on). As far as the number of padding arguments is concerned, 6 is correct for the 3D case: 2 pads per axis x 3 axes (x,y,z).
I'm using the CPP api, trying with your settings gives me the same error: main.cpp(164): error: in "ice_caffe2_neural_ice_suite/check_training_caffe2_neural_ice": Dynamic exception type: class caffe2::EnforceNotMet std::exception::what: [enforce fail at pool_op_cudnn.cu:269] status == CUDNN_STATUS_SUCCESS. 3 vs 0. , Error at: D:/_DEV/3rdParties/caffe2/caffe2/operators/pool_op_cudnn.cu:269: CUDNN_STATUS_BAD_PARAM Error from operator: input: "conv1_conv_1_s" output: "pool1" type: "MaxPool" arg { name: "strides" ints: 2 ints: 2 ints: 2 } arg { name: "pads" ints: 1 ints: 1 ints: 1 ints: 1 ints: 1 ints: 1 } arg { name: "kernels" ints: 3 ints: 3 ints: 3 } device_option { device_type: 1 }
My input tensor has Dims: (11,16,128,128,)
Do you see any obvious mistake? That would help me a lot. Thanks.
The input blob should be NCVHW, i.e. batch_size x feat_size x number_of_video_frames_per_batch x height x width. (V is the number of video frames per video and N is the number of videos i.e. batch_size).
Clearly this is a 5D tensor, while your input has 4 dimensions. If you are processing one video per batch, you should still add a singleton dimension to your tensor to make it 5D.
Typically, to convert the standard NCHW caffe2 input to the 3D format, I use the following numpy code:
input3D = inputNCHW.transpose([1,0,2,3])[np.newaxis]
It is assumed of course that all N images in the batch are frames from the same video.
Thanks so much ! Is this supposed to be the same for 3D convolutions?
Also, maybe can I use a reshape operator?
You're welcome. Yes, the Conv and AveragePool operators expect same input format (NCVHW) and the same parameters (kernels, pads, strides). The caffe2 documentation is in its infancy right now, and I had to dig into the code to figure these things out. Sure, you can use the reshape operator.
Ok, many thanks, that super super helpful.
One more question: for 3d conv, what weight initialization function do you use? Does the following looks correct to you with a NCVHW layout? _init.addXavierFillOp({ outSize, inSize, kernel, kernel, kernel }, b + "_w");
I haven't initialized weights to a 3D conv operator from scratch. I ported 3D ResNet models from pytorch to caffe2 from here: https://github.com/kenshohara/video-classification-3d-cnn-pytorch They also provide links to pretrained models and I used them.
Regardless, your weight tensor size is correct. It should be <outSize, inSize, Kernel, Kernel, Kernel>.
Sorry but I don't get what you are doing here: input3D = inputNCHW.transpose([1,0,2,3])[np.newaxis]
According to the doc on the Transpose operator, you are setting N (index 0 = batch size) as the second element, why that?
Assume NCHW to be number_of_video_frames x feat_size x height x width. With that assumption, let's replace N by V. Thus we have VCHW representation of a tensor. What I am doing is transposing V,C and adding a singleton dimension at axis=0. This can be understood as a two step procedure:
CVHW = VCHW.transpose([1,0,2,3])
NCVHW = CVHW[np.newaxis] #this results into N = 1, i.e. batch_size is 1
Ok, I see, my case is a bit different as my tensor has n differents samples (n = mini batch size), the second dimension of each sample is made of v consecutive frames, with a certain number of channels W H.
So let's say I have 7 samples with 5 rgb temporal images, with w,h=128,128, my input tensor is: NCHW={7,3*5,128,128} And I'm reshaping to: NCVHW={7,3,5,128,128}
When spliting the dimensions, so changing from 4 dimensions to 5, then 5 to 4, the reshape seems to pass the scheme checking, but the backprop doesn't seems to be happy. Still investigating, that's weird.
Ok, I finally succeed by using the Transpose operator + PrependDim operator. The reshape for some reason is not working for 5d reshaping.
Nice. Out of curiosity, what was the error thrown by the Reshape Op?
The following check was not satisfied: reshape_op.h +105
CAFFE_ENFORCE_EQ(
total_size,
size,
"Argument `shape` does not agree with the input data.",
" (",
total_size,
" != ",
size,
")");
The net creation was working, but it's when doing the backprop that things turned weird. size was missing the V channel dim. (If I remind well)
I looked at the source code of the reshape_op. It looks like you're using a tensor to specify the new shape, and the sizes of the new and old tensors do not match. To reshape {7,3*5,128,128} to NCVHW={7,3,5,128,128}, you can use shape=[0,3,5,0,-1]. (0 copies the dimension at current axis and -1 allows computation of the dimension at current axis). Anyhow, glad that your issue is resolved.
Waw, I think you got it, thanks a lot! Will try tomorrow.
I don't know who is responsible for the doc, but I would have loved to have more.
I think there is a bug when trying to use 3d pooling.
I see: conv_pool_op_base.h +150: CAFFE_ENFORCEEQ(pads.size(), 2 * kernel_.size());
I think it needs to be: CAFFE_ENFORCEEQ(pads.size(), std::pow(2, kernel_.size()));
As the padding is, I suppose top, left, bottom right in 2d (2**2 values) and 8 values in 3d, right?
When I try with a padding size of 6, cudnn throws an exception: CUDNN_STATUS_BAD_PARAM: std::exception::what: [enforce fail at pool_op_cudnn.cu:269] status == CUDNN_STATUS_SUCCESS. 3 vs 0. , Error at: D:/_DEV/3rdParties/caffe2/caffe2/operators/pool_op_cudnn.cu:269: CUDNN_STATUS_BAD_PARAM Error from operator: input: "conv5_spatbn_1" output: "pool5" type: "MaxPool" arg { name: "strides" ints: 2 ints: 2 ints: 2 } arg { name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 } arg { name: "kernels" ints: 2 ints: 2 ints: 2 } arg { name: "order" s: "NCHW" } arg { name: "mode" s: "reflect" } device_option { device_type: 1 }
System information