CharlesShang / FastMaskRCNN

Mask RCNN in TensorFlow
Apache License 2.0
3.1k stars 1.1k forks source link

ROIAlign #1

Closed yexiguafuqihao closed 7 years ago

yexiguafuqihao commented 7 years ago

It's a nice job and I wonder where is the ROIAlign layer or it does not implemented yet?

CharlesShang commented 7 years ago

Haha, I've spent one whole day to implement it in C++. Found tf already has an op tf.image.crop_and_resize, and its differentiable.

amrit110 commented 7 years ago

Nice work, is it then just to use that layer and fix the quantization according to the paper assuming the resize operation is done using bilinear interp. I was going to attempt that.

CharlesShang commented 7 years ago

@amrit110, I guess so.

xiaoxingzeng commented 7 years ago

@CharlesShang ,I can implemented the roialign layer in py-faster-rcnn with crop_and_resize in tensorflow?

1292765944 commented 7 years ago

@CharlesShang I do not understand this op. Can you explain how does tensorflow crop the feature map and resize its to 7x7 if the input rois is floating-point position? In my understanding, first we calculate all the final 7*7 output position in the input features, Then for each output floating-point position, using bilinear interpolation to calculate its exact value. is it the implementation for this op in tf? However, we only use one single value for each bin and kaiming uses four, Do you plan to implement it in the future? Bests!

xmyqsh commented 7 years ago

@1292765944 @CharlesShang

2We sample four regular locations, so that we can evaluate either max or average pooling. In fact, interpolating only a single value at each bin center (without pooling) is nearly as effective. One could also sample more than four locations per bin, which we found to give diminishing returns.

The paper has said in the end of page3 that interpolating only a single value at each bin center (without pooling) is nearly as effective.. And this is equivalent to the implementation with crop_and_ resize.

Well, I think implementing a four regular locations pooling version ROIAlign is not difficult. Just crop_and_resize to 14*14, then do max or average pooling with kernel 2*2 and stride [2, 2]. Isn't it right?

1292765944 commented 7 years ago

@xmyqsh Very kind of you for replies! I still have two questions.

  1. Can you find the implementations of the operator crop_and_ resize. I do not find the corresponding file tensorflow/python/ops/gen_image_ops.py.
  2. I wonder the difference between RoIWarp and RoIAlign. In RoIWarp, RoI are divided by 16 for Quantization, while RoIAlign solves this problem by using floating-point positions for RoI position. Then both of the operators implement bilinear interpolation for 7x7 RoI feature computing. Am I right? Bests!
xmyqsh commented 7 years ago

@1292765944

  1. $TENSORFLOW_SOURCE_CODE/tensorflow/core/kernels/crop_and_resize_op.cc
  2. [x/16] is an example of quantization. ROIWarp is not that. You can find details of ROIWarp in Instance-aware Semantic Segmentation via Multi-task Network Cascades. I have checked the function of ROIWarp, which is exactly the crop and wrap with bilinear interpolation, seems same as the crop_and_resize, and pool op followed by ROIWarp. MaskRCNN said

    Unlike RoIAlign, RoIWarp overlooked the alignment issue and was implemented in [7] as quantizing RoI just like RoIPool.

I can't find quantizing from ROIWarp functions. I think we could read the source of RoIWarp in MNC for more detail.