JunkyByte / deepcharuco

Unofficial pytorch implementation of the model proposed in Deep ChArUco: Dark ChArUco Marker Pose Estimation CVPR2019 https://arxiv.org/abs/1812.03247 for ChArUco board localization.
MIT License
33 stars 7 forks source link

RefineNet #1

Closed arsalanshakeel closed 1 year ago

arsalanshakeel commented 1 year ago

Hey @JunkyByte,

Thanks for the deepcharuco implementation. To my understanding, considering the diagram and the model architecture that the authors have defined in the research paper, it seems like the model architecture of CharUcoNet and RefineNet is the same except for the last heads (CharUcoNet 2-headed output and RefineNet 1-head output), What I mean is in total 8 convolutions and 3 max-pool layers, each max-pool after 2 consecutive convolution layers.

I would like to know that for the RefineNet model architecture, you have opted for upsampling. Was there any particular reason or am I missing out on something? please enlighten me.

JunkyByte commented 1 year ago

Hi @arsalanshakeel thanks for your interest.

I have opted for upsampling because the architecture they presented for RefineNet is not clear to me.

Yes it seems that the architecture should be the same up to the heads, but if you input a (24,24) patch into that VGG based backbone you have a (C, 24/8, 24/8) output from the last conv. How the head should be designed to output a 4096D vector starting from this (C, 3, 3) is a mystery to me.

That's why I opted for upsampling: I remove some border pixels using padding=0 in the first few convs to obtain a (C, 16, 16) tensor. I apply max pool to (C, 8, 8) and continue by 2x upsampling 3 times to obtain a (1, 64, 64) as output.

If you have a valid idea on how to design the RefineNet feel free to discuss it