Different padding value for pooling layer

hansely commented 5 years ago

I am trying to convert the googlenet prototxt ((https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/deploy.prototxt) using caffe_to_nnef converter. However, for pooling layer with no pads specified(e.g. name: "pool1/3x3_s2"), it generates a padding.

For example, the pooling layer named "pool1/3x3_s2", it generates a padding of [(0,0, (0,0), (0,1), (0,1)]. I think this is a similar problem to the issue #31 . Is this meant to happen?

gyenesvi commented 5 years ago

Yes, this is intended behaviour. In Caffe, (by default) pooling rounds up the output size calculation if the stride does not evenly divide the input size. That is, as if the input was padded on the bottom-right side. An extra parameter has recently been introduced to pooling to control rounding (up or down, the documentation does not mention it, but you can check out the code on GitHub). If you set it to rounding down, then the output size will be one smaller, and the padding would disappear (in fact it is implicitly a negative padding then on the bottom-right, but that does not need to be put explicitly, as the division in the output shape calculation takes care of it).

hansely commented 5 years ago

Thank you very much for your detailed answer. I have another question regarding the auto-padding. I've read the documentation and from my understanding, when the padding is empty, NNEF will automatically

calculate the total padding size(t) = (x-1)⋅s+fd−X.
calculate the padding size p = floor (t/2), q = ceil (t/2),
the new padding value will be (p,p,q,q).

Is this right? If yes, is there a way to extract x,X,s,fd values in NNEF?

gyenesvi commented 5 years ago

First note, that NNEF does not calculate things (since it is a file format), it defines them, so an NNEF consumer library can do the calculations. You are right about 1. and 2. I don't quite understand 3., the actual padding is [(p_H, q_H), (p_W, q_W)], where H and W stand for the height and width dimensions. X is the input size (so there is X_H and X_W actually), and x is the output size (x_H and x_W), such that x = ceil(X/s), where s (again s_H, and s_W) is the stride, and f_d is the dilated filter size (again separately for height and width). So some are parameters of the convolution (stride, filter size, dilation), you don't have to extract them, it is in the file (for example the NNEF parser extracts them from the file), the others (X, x) are tensor shapes that are calculated from the graph by propagating the input shape (the parser also does that).

hansely commented 5 years ago

Please check if I'm right.

p_H = floor ( {(x_H-1)s_H + fd_H - X_H} / 2 ) q_H = ceil ( {(x_H-1)s_H + fd_H - X_H} / 2 ) p_W = floor ( {(x_W-1)s_W + fd_W - X_W} / 2 ) q_W = ceil ( {(x_W-1)s_W + fd_W - X_W} / 2 )

where all the variables can be obtained by doing

graph = load_model (dir/to/nnef_graph) operation = graph.operations tensors = graph.tensors attribs = operation.attribs input_tensor = tensors[operation.inputs['input'] output_tensor = tensors[operation.inputs['output'] stride = attribs['stride'] dilation = attribs['dilation']

X_H = input.shape[2] X_W = input.shape[3] x_H = output.shape[2] x_W = output.shape[3] s_H = stride[0] s_W = stride[1] fd_H = dilation[0] fd_W = dilation[0]

gyenesvi commented 5 years ago

You are almost right. First, you have to select a specific operation of some index idx:

operation = graph.operations[idx]

Second, you have to calculate fd for yourself, because what is stored is the dilation

d_H = dilation[0]
d_W = dilation[1]

filter = tensors[operation.inputs['filter']]
f_H = filter.shape[2]
f_W = filter.shape[3]
fd_H = (f_H - 1) * d_H + 1
fd_W = (f_W - 1) * d_W + 1

Note, that the NNEF tools repository has just been updated (the parser and converter code has been refactored). You have to run the shape inference separately after loading the graph (before accessing the shapes):

graph = nnef.load_model('dir/to/nnef_graph')
nnef.infer_shapes(graph)

gyenesvi commented 5 years ago

Can this issue be closed?

hansely commented 5 years ago

Thank you very much for your detailed explanation. I really appreciate it.

hansely commented 5 years ago

Hi, I have a question regarding the auto padding. If there is no filter tensor, how are f_H and f_W calculated? For example, on the inception v4 model(floating point model) at NNEF model zoo, line 796:

avg_pool = avg_pool(concat_2, border = 'constant', dilation = [], padding = [], size = [1, 1, 3, 3], stride = [1, 1, 1, 1);

In this case, how should I calculate fd_H and fd_W values?

gyenesvi commented 5 years ago

Pooling has a size, in this example size = [1,1,3,3], where the last two are the spatial dimensions (height and width), so f_H = 3 and f_W = 3. You can calculate fd_H and fd_W the same way as before.

hansely commented 5 years ago

So if there is a filter tensor, f_H = filter.shape[2] f_W = filter.shape[3]

otherwise,

f_H = size[2] f_W = size[3].

Am I correct?

gyenesvi commented 5 years ago

Yes, that is correct.

hansely commented 5 years ago

Thanks!

KhronosGroup / NNEF-Tools

Different padding value for pooling layer #74