Zardinality / TF_Deformable_Net

Deformable convolution net on Tensorflow
MIT License
169 stars 53 forks source link

Offsets Subnet Details #17

Closed juanilarregui closed 6 years ago

juanilarregui commented 6 years ago

Hi,

I was wondering about the details of the subnet that predicts the offsets used by the sampling operation, as I couldn't dilucidate them from the paper. This subnet takes the feature map and adds parallel (to the core network) conv layers to predict the offsets that will be used:

  1. How many conv layers are used to get the offsets from the features?
  2. Which is their kernel size and activation function?
  3. Is batch norm used in this subnet?

Thanks for your great work.

Zardinality commented 6 years ago

Take Res101 net for instance, all layer whose name ends with "offset" in this file belongs the offset branch. You could find answers to all your questions in the relevant code fragments. You are welcome to ask more questions if you are still confused about the code. To be honest, the word dilucidate really baffles me:)

juanilarregui commented 6 years ago

Thanks for the quick response! So, this offsets' subnet consist of only one conv layer, with no activation, a kernel size of 3x3, a dilation rate of 2, am I right?

One new question: why num_deform_group=4 in deform_conv? Or, the same question asked in a different way: why 72 outputs for the offsets? Of course, 72 = (3x3x2) x 4 deformable_groups. I thought yo only needed 18 outputs for the 9 offsets "in the x direction" and 9 "in the y direction" (or 1 deformable group, I think).

W.r.t the word "dilucidate", sorry! I'm from Argentina, in spanish we use "dilucidar" as a way to say "understand" or "comprehend", so I think I just extended it to english, haha. Next time, I'll use "elucidate" or "understand", I promise ;)

Zardinality commented 6 years ago

You are right about the parameter setting and the arithmetic of num_deform_group. As to why num_deform_group is set to 4, I personally have no intuition about it, I just copied all the parameters from the original implementation. But one thing is true: more num_deform_group means more offset to manipulate, hence more of the receptive fields.