endernewton / tf-faster-rcnn

Tensorflow Faster RCNN for Object Detection
https://arxiv.org/pdf/1702.02138.pdf
MIT License
3.65k stars 1.57k forks source link

RoIPool question #265

Open trminh89 opened 6 years ago

trminh89 commented 6 years ago

Hi, I read your RoIPool operation and can't understand some lines:

 def _crop_pool_layer(self, bottom, rois, name):
    with tf.variable_scope(name) as scope:
      batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])
      # Get the normalized coordinates of bounding boxes
      bottom_shape = tf.shape(bottom)
      height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
      width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
      x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width
      y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
      x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
      y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height
      # Won't be back-propagated to rois anyway, but to save time
      bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))
      pre_pool_size = cfg.POOLING_SIZE * 2
      crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")

    return slim.max_pool2d(crops, [2, 2], padding='SAME')

My question is why do you- 1 in height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0]). Will it shift in the bbox? Why don't we just get x1, y1, x2, y2from rois and then devide to self._feat_stride[0]? Something like:

  x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1")
  y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1")
  x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2")
  y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2")
  ## then
  bboxes = tf.concat([y1, x1, y2, x2], axis=1)
  bboxes = bboxes * (1.0 / self._feat_stride[0]) ## will get better coordinates here?
endernewton commented 6 years ago

you can try :) i was referring to the documentation of crop_and_resize, and in that it minuses one.

kl456123 commented 6 years ago

I want to know why use cfg.POOLING_SIZE * 2, please help me,thanks