Closed LiangXu123 closed 6 years ago
After the correlation layers in Conv3, Conv4, and Conv5, the (h,w) dims of the correlation layers should be the same as the (h,w) of the bbox reg feature map. At this point, you can simply concat the features along the channels dimension, apply a 1x1 convolution to reduce the number of channels to 4 n_reg_classes 7 * 7 (e.g. n_reg_classes=1 for class-agnostic bbox regression), and run a pooling operation (e.g. position-sensitive roi pooling) on that feature map.
got that now,thank you for being so patient
thanks for your excellent job,but i got confused in some details in your paper. In your paper,the tracking ROI pooling layer is operate on the stack of {Xcorr,Xreg-t,Xreg-t+1} as far as i can see: both Xreg-t and Xreg-t+1 layer has s shape of kk4 Xcorr consist of correlation output of conv3,4,5 respectively,and the correlation output should have shape like HW(2d+1)*(2d+1) so: 1:how to concat different layer together like:Xcorr and Xreg-t 2:how to pool on the stacked feature map thank you.