Meaning of num_groups in offsets

Zardinality / TF_Deformable_Net

Deformable convolution net on Tensorflow

MIT License

169 stars 53 forks source link

Meaning of num_groups in offsets #18

Closed knmac closed 6 years ago

knmac commented 6 years ago

What is the meaning of num_groups? As I understand, the output of res5a_branch2b_offset has the shape of (?, 14, 14, 72), where (14, 14) is the spatial dimension of the input and 72 = 2 (due to x and y) 3 3 (kernel dimension is 3x3) * 4 (num_groups). But I can't understand the meaning of num_groups variable here.

Zardinality commented 6 years ago

What you mentioned as num_groups is actually num_deform_group flag in the layer function. The num_deform_group and num_groups flags works in different ways. You can refer to this tutorial for how num_group works. As to how num_deform_group works, you can simply imagine it splits the input to num_deform_group parts in channel dimension, and use different offsets for each of parts. Therefore we need num_deform_group different offsets, that further gives us the equation you post.

knmac commented 6 years ago

Thank you for your response. Could you tell me the order of the offsets channels, e.g. which channel correspond to horizontal offsets, vertical offsets, which index of the convolution kernel, and which group? Besides, when you part the offset to num_deform_group, how you determine which part of the input map corresponds to which group? Thank you very much.

Zardinality commented 6 years ago

These questions are rather subtle, you could find answers to most of them in the source of this operation. Here is a fragment explaining the index behavior of this op:

const int data_offset_h_ptr = ((2 * (i * kernel_w + j)) * height_col + h_col) * width_col + w_col;
const int data_offset_w_ptr = ((2 * (i * kernel_w + j) + 1) * height_col + h_col) * width_col + w_col;