Open trminh89 opened 6 years ago
Hi, I read your RoIPool operation and can't understand some lines:
def _crop_pool_layer(self, bottom, rois, name): with tf.variable_scope(name) as scope: batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1]) # Get the normalized coordinates of bounding boxes bottom_shape = tf.shape(bottom) height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0]) width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0]) x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height # Won't be back-propagated to rois anyway, but to save time bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1)) pre_pool_size = cfg.POOLING_SIZE * 2 crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops") return slim.max_pool2d(crops, [2, 2], padding='SAME')
My question is why do you- 1 in height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0]). Will it shift in the bbox? Why don't we just get x1, y1, x2, y2from rois and then devide to self._feat_stride[0]? Something like:
- 1
height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
x1, y1, x2, y2
rois
self._feat_stride[0]
x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") ## then bboxes = tf.concat([y1, x1, y2, x2], axis=1) bboxes = bboxes * (1.0 / self._feat_stride[0]) ## will get better coordinates here?
you can try :) i was referring to the documentation of crop_and_resize, and in that it minuses one.
I want to know why use cfg.POOLING_SIZE * 2, please help me,thanks
Hi, I read your RoIPool operation and can't understand some lines:
My question is why do you
- 1
inheight = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
. Will it shift in the bbox? Why don't we just getx1, y1, x2, y2
fromrois
and then devide toself._feat_stride[0]
? Something like: