facebookarchive / caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework.
https://caffe2.ai
Apache License 2.0
8.42k stars 1.95k forks source link

3D pooling exception throw #2375

Closed edubois closed 6 years ago

edubois commented 6 years ago

I think there is a bug when trying to use 3d pooling.

I see: conv_pool_op_base.h +150: CAFFE_ENFORCEEQ(pads.size(), 2 * kernel_.size());

I think it needs to be: CAFFE_ENFORCEEQ(pads.size(), std::pow(2, kernel_.size()));

As the padding is, I suppose top, left, bottom right in 2d (2**2 values) and 8 values in 3d, right?

When I try with a padding size of 6, cudnn throws an exception: CUDNN_STATUS_BAD_PARAM: std::exception::what: [enforce fail at pool_op_cudnn.cu:269] status == CUDNN_STATUS_SUCCESS. 3 vs 0. , Error at: D:/_DEV/3rdParties/caffe2/caffe2/operators/pool_op_cudnn.cu:269: CUDNN_STATUS_BAD_PARAM Error from operator: input: "conv5_spatbn_1" output: "pool5" type: "MaxPool" arg { name: "strides" ints: 2 ints: 2 ints: 2 } arg { name: "pads" ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 ints: 0 } arg { name: "kernels" ints: 2 ints: 2 ints: 2 } arg { name: "order" s: "NCHW" } arg { name: "mode" s: "reflect" } device_option { device_type: 1 }

System information

siddharthachandra commented 6 years ago

Try this: model.MaxPool([input],[output], kernels=[3,3,3], pads=[1,1,1]*2, strides=[2,2,2])

For ND operators, caffe2 expects plural forms of argument names (i.e. kernels in place of kernel and so on). As far as the number of padding arguments is concerned, 6 is correct for the 3D case: 2 pads per axis x 3 axes (x,y,z).

edubois commented 6 years ago

I'm using the CPP api, trying with your settings gives me the same error: main.cpp(164): error: in "ice_caffe2_neural_ice_suite/check_training_caffe2_neural_ice": Dynamic exception type: class caffe2::EnforceNotMet std::exception::what: [enforce fail at pool_op_cudnn.cu:269] status == CUDNN_STATUS_SUCCESS. 3 vs 0. , Error at: D:/_DEV/3rdParties/caffe2/caffe2/operators/pool_op_cudnn.cu:269: CUDNN_STATUS_BAD_PARAM Error from operator: input: "conv1_conv_1_s" output: "pool1" type: "MaxPool" arg { name: "strides" ints: 2 ints: 2 ints: 2 } arg { name: "pads" ints: 1 ints: 1 ints: 1 ints: 1 ints: 1 ints: 1 } arg { name: "kernels" ints: 3 ints: 3 ints: 3 } device_option { device_type: 1 }

My input tensor has Dims: (11,16,128,128,)

Do you see any obvious mistake? That would help me a lot. Thanks.

siddharthachandra commented 6 years ago

The input blob should be NCVHW, i.e. batch_size x feat_size x number_of_video_frames_per_batch x height x width. (V is the number of video frames per video and N is the number of videos i.e. batch_size).

Clearly this is a 5D tensor, while your input has 4 dimensions. If you are processing one video per batch, you should still add a singleton dimension to your tensor to make it 5D.

Typically, to convert the standard NCHW caffe2 input to the 3D format, I use the following numpy code:

input3D = inputNCHW.transpose([1,0,2,3])[np.newaxis] It is assumed of course that all N images in the batch are frames from the same video.

edubois commented 6 years ago

Thanks so much ! Is this supposed to be the same for 3D convolutions?

edubois commented 6 years ago

Also, maybe can I use a reshape operator?

siddharthachandra commented 6 years ago

You're welcome. Yes, the Conv and AveragePool operators expect same input format (NCVHW) and the same parameters (kernels, pads, strides). The caffe2 documentation is in its infancy right now, and I had to dig into the code to figure these things out. Sure, you can use the reshape operator.

edubois commented 6 years ago

Ok, many thanks, that super super helpful.

edubois commented 6 years ago

One more question: for 3d conv, what weight initialization function do you use? Does the following looks correct to you with a NCVHW layout? _init.addXavierFillOp({ outSize, inSize, kernel, kernel, kernel }, b + "_w");

siddharthachandra commented 6 years ago

I haven't initialized weights to a 3D conv operator from scratch. I ported 3D ResNet models from pytorch to caffe2 from here: https://github.com/kenshohara/video-classification-3d-cnn-pytorch They also provide links to pretrained models and I used them.

Regardless, your weight tensor size is correct. It should be <outSize, inSize, Kernel, Kernel, Kernel>.

edubois commented 6 years ago

Sorry but I don't get what you are doing here: input3D = inputNCHW.transpose([1,0,2,3])[np.newaxis]

According to the doc on the Transpose operator, you are setting N (index 0 = batch size) as the second element, why that?

siddharthachandra commented 6 years ago

Assume NCHW to be number_of_video_frames x feat_size x height x width. With that assumption, let's replace N by V. Thus we have VCHW representation of a tensor. What I am doing is transposing V,C and adding a singleton dimension at axis=0. This can be understood as a two step procedure:

CVHW = VCHW.transpose([1,0,2,3])
NCVHW = CVHW[np.newaxis] #this results into N = 1, i.e. batch_size is 1
edubois commented 6 years ago

Ok, I see, my case is a bit different as my tensor has n differents samples (n = mini batch size), the second dimension of each sample is made of v consecutive frames, with a certain number of channels W H.

So let's say I have 7 samples with 5 rgb temporal images, with w,h=128,128, my input tensor is: NCHW={7,3*5,128,128} And I'm reshaping to: NCVHW={7,3,5,128,128}

When spliting the dimensions, so changing from 4 dimensions to 5, then 5 to 4, the reshape seems to pass the scheme checking, but the backprop doesn't seems to be happy. Still investigating, that's weird.

edubois commented 6 years ago

Ok, I finally succeed by using the Transpose operator + PrependDim operator. The reshape for some reason is not working for 5d reshaping.

siddharthachandra commented 6 years ago

Nice. Out of curiosity, what was the error thrown by the Reshape Op?

edubois commented 6 years ago

The following check was not satisfied: reshape_op.h +105

      CAFFE_ENFORCE_EQ(
          total_size,
          size,
          "Argument `shape` does not agree with the input data.",
          " (",
          total_size,
          " != ",
          size,
          ")");

The net creation was working, but it's when doing the backprop that things turned weird. size was missing the V channel dim. (If I remind well)

siddharthachandra commented 6 years ago

I looked at the source code of the reshape_op. It looks like you're using a tensor to specify the new shape, and the sizes of the new and old tensors do not match. To reshape {7,3*5,128,128} to NCVHW={7,3,5,128,128}, you can use shape=[0,3,5,0,-1]. (0 copies the dimension at current axis and -1 allows computation of the dimension at current axis). Anyhow, glad that your issue is resolved.

edubois commented 6 years ago

Waw, I think you got it, thanks a lot! Will try tomorrow.

I don't know who is responsible for the doc, but I would have loved to have more.