Closed knmac closed 6 years ago
What you mentioned as num_groups
is actually num_deform_group
flag in the layer function. The num_deform_group
and num_groups
flags works in different ways. You can refer to this tutorial for how num_group
works.
As to how num_deform_group
works, you can simply imagine it splits the input to num_deform_group
parts in channel dimension, and use different offsets for each of parts. Therefore we need num_deform_group
different offsets, that further gives us the equation you post.
Thank you for your response. Could you tell me the order of the offsets channels, e.g. which channel correspond to horizontal offsets, vertical offsets, which index of the convolution kernel, and which group? Besides, when you part the offset to num_deform_group
, how you determine which part of the input map corresponds to which group? Thank you very much.
These questions are rather subtle, you could find answers to most of them in the source of this operation. Here is a fragment explaining the index behavior of this op:
const int data_offset_h_ptr = ((2 * (i * kernel_w + j)) * height_col + h_col) * width_col + w_col;
const int data_offset_w_ptr = ((2 * (i * kernel_w + j) + 1) * height_col + h_col) * width_col + w_col;
What is the meaning of num_groups? As I understand, the output of res5a_branch2b_offset has the shape of (?, 14, 14, 72), where (14, 14) is the spatial dimension of the input and 72 = 2 (due to x and y) 3 3 (kernel dimension is 3x3) * 4 (num_groups). But I can't understand the meaning of num_groups variable here.