JialeCao001 / D2Det

D2Det: Towards High Quality Object Detection and Instance Segmentation (CVPR2020)
https://openaccess.thecvf.com/content_CVPR_2020/papers/Cao_D2Det_Towards_High_Quality_Object_Detection_and_Instance_Segmentation_CVPR_2020_paper.pdf
MIT License
297 stars 86 forks source link

Discriminative ROI Pooling #29

Open Kevin43614 opened 3 years ago

Kevin43614 commented 3 years ago

Hello I want to ask one question about your paper. You say you use a pooling size of 7 × 7 (where k = 7) for classification, so "light-weight offset prediction only requires a k/2 ×k/2 sized RoIAlign" which means pass 3.5*3.5's feature map through fully connected layers ?

JialeCao001 commented 3 years ago

@Kevin43614 Thanks for interest. Yeah. I remeber that we use 3x3 for offset prediciton.

Kevin43614 commented 3 years ago

@JialeCao001 If the input size used to offset prediction is 3x3 , and through fully connected layers , how to do RoIAlign and generate a 2k2k(1414) size feature map ?

JialeCao001 commented 3 years ago

@Kevin43614 After fc layers, we reshape the vector to feature map and upsample the feature map.

z0978916348 commented 3 years ago

@JialeCao001
I am also confused about this part. Can you provide more details about operations from three fc layers to generate (2k x 2k) resolution feature map?

JialeCao001 commented 3 years ago

@z0978916348 Please refer the code. https://github.com/JialeCao001/D2Det/blob/a76781ab624a1304f9c15679852a73b4b6770950/mmdet/ops/dcn/deform_pool.py#L199