Closed hansely closed 5 years ago
Yes, this is intended behaviour. In Caffe, (by default) pooling rounds up the output size calculation if the stride does not evenly divide the input size. That is, as if the input was padded on the bottom-right side. An extra parameter has recently been introduced to pooling to control rounding (up or down, the documentation does not mention it, but you can check out the code on GitHub). If you set it to rounding down, then the output size will be one smaller, and the padding would disappear (in fact it is implicitly a negative padding then on the bottom-right, but that does not need to be put explicitly, as the division in the output shape calculation takes care of it).
Thank you very much for your detailed answer. I have another question regarding the auto-padding. I've read the documentation and from my understanding, when the padding is empty, NNEF will automatically
Is this right? If yes, is there a way to extract x,X,s,fd values in NNEF?
First note, that NNEF does not calculate things (since it is a file format), it defines them, so an NNEF consumer library can do the calculations. You are right about 1. and 2. I don't quite understand 3., the actual padding is [(p_H, q_H), (p_W, q_W)], where H and W stand for the height and width dimensions. X is the input size (so there is X_H and X_W actually), and x is the output size (x_H and x_W), such that x = ceil(X/s), where s (again s_H, and s_W) is the stride, and f_d is the dilated filter size (again separately for height and width). So some are parameters of the convolution (stride, filter size, dilation), you don't have to extract them, it is in the file (for example the NNEF parser extracts them from the file), the others (X, x) are tensor shapes that are calculated from the graph by propagating the input shape (the parser also does that).
Please check if I'm right.
p_H = floor ( {(x_H-1)s_H + fd_H - X_H} / 2 ) q_H = ceil ( {(x_H-1)s_H + fd_H - X_H} / 2 ) p_W = floor ( {(x_W-1)s_W + fd_W - X_W} / 2 ) q_W = ceil ( {(x_W-1)s_W + fd_W - X_W} / 2 )
where all the variables can be obtained by doing
graph = load_model (dir/to/nnef_graph) operation = graph.operations tensors = graph.tensors attribs = operation.attribs input_tensor = tensors[operation.inputs['input'] output_tensor = tensors[operation.inputs['output'] stride = attribs['stride'] dilation = attribs['dilation']
X_H = input.shape[2] X_W = input.shape[3] x_H = output.shape[2] x_W = output.shape[3] s_H = stride[0] s_W = stride[1] fd_H = dilation[0] fd_W = dilation[0]
You are almost right. First, you have to select a specific operation of some index idx
:
operation = graph.operations[idx]
Second, you have to calculate fd
for yourself, because what is stored is the dilation
d_H = dilation[0]
d_W = dilation[1]
filter = tensors[operation.inputs['filter']]
f_H = filter.shape[2]
f_W = filter.shape[3]
fd_H = (f_H - 1) * d_H + 1
fd_W = (f_W - 1) * d_W + 1
Note, that the NNEF tools repository has just been updated (the parser and converter code has been refactored). You have to run the shape inference separately after loading the graph (before accessing the shapes):
graph = nnef.load_model('dir/to/nnef_graph')
nnef.infer_shapes(graph)
Can this issue be closed?
Thank you very much for your detailed explanation. I really appreciate it.
Hi, I have a question regarding the auto padding. If there is no filter tensor, how are f_H and f_W calculated? For example, on the inception v4 model(floating point model) at NNEF model zoo, line 796:
avg_pool = avg_pool(concat_2, border = 'constant', dilation = [], padding = [], size = [1, 1, 3, 3], stride = [1, 1, 1, 1);
In this case, how should I calculate fd_H and fd_W values?
Pooling has a size, in this example size = [1,1,3,3]
, where the last two are the spatial dimensions (height and width), so f_H = 3
and f_W = 3
. You can calculate fd_H
and fd_W
the same way as before.
So if there is a filter tensor, f_H = filter.shape[2] f_W = filter.shape[3]
otherwise,
f_H = size[2] f_W = size[3].
Am I correct?
Yes, that is correct.
Thanks!
I am trying to convert the googlenet prototxt ((https://github.com/BVLC/caffe/blob/master/models/bvlc_googlenet/deploy.prototxt) using caffe_to_nnef converter. However, for pooling layer with no pads specified(e.g. name: "pool1/3x3_s2"), it generates a padding.
For example, the pooling layer named "pool1/3x3_s2", it generates a padding of [(0,0, (0,0), (0,1), (0,1)]. I think this is a similar problem to the issue #31 . Is this meant to happen?