Open Yuxin-CV opened 4 years ago
@Yuxin-CV I don't think it is correct to normalize the coordinates to (-1, 1), which implies that (x, y) is the center of the map. But it is not always true. Just scaling down the coordinates by a constant scale (e.g., 400) is fine.
Just to be clear. The correct code should be like,
For location of interest (x, y) on the input image:
x_range = torch.arange(W_mask)
y_range = torch.arange(H_mask)
y_grid, x_grid = torch.grid(y_range, x_range)
y_rel_coord = (y_grid – (y - mask_stride // 2) / mask_stride) / 400.0
x_rel_coord = (x_grid – (x - mask_stride // 2) / mask_stride) / 400.0
rel_coord = torch.cat(x_rel_coord, y_rel_coord)
Note that mapping (x, y)
from the input to feature maps should be ((x - stride/2) / stride, (y - stride/2) / stride)
.
Thanks for your prompt reply. @tianzhi0549
Hello @tianzhi0549 , I have implemented rel-coord according to your hint above. However, the AP performance droped by 3. Would you like to check the correctness of my code? Thanks.
for CondInst
N, C, h, w = self.masks.shape mask_stride = self.strides[0] x_range = torch.arange(w) y_range = torch.arange(h) y_grid, x_grid = torch.meshgrid(y_range, x_range) y_grid = y_grid.to(self.masks.device) x_grid = x_grid.to(self.masks.device)
r_h = int(h * self.strides[0])
r_w = int(w * self.strides[0])
targets_masks = [target_im.gt_masks.tensor for target_im in self.gt_instances]
masks_t = self.prepare_masks(h, w, r_h, r_w, targets_masks)
mask_loss = self.masks[0].new_tensor(0.0)
batch_ins = im_idxes.shape[0]
# for each image
for i in range(N):
inds = (im_idxes == i).nonzero().flatten()
ins_num = inds.shape[0]
if ins_num > 0:
controllers = controllers_pred[inds]
weights1 = controllers[:, :80].reshape(-1, 8, 10).reshape(-1, 10).unsqueeze(-1).unsqueeze(-1)
bias1 = controllers[:, 80:88].flatten()
weights2 = controllers[:, 88:152].reshape(-1, 8, 8).reshape(-1, 8).unsqueeze(-1).unsqueeze(-1)
bias2 = controllers[:, 152:160].flatten()
weights3 = controllers[:, 160:168].unsqueeze(-1).unsqueeze(-1)
bias3 = controllers[:, 168:169].flatten()
mask_feat = self.masks[None, i]
location = locations[inds]
x = location[:, 0]
y = location[:, 1]
y_rel_coord = (y_grid[None, None, ...] - (y[None, ..., None, None] - mask_stride // 2) / mask_stride) / self.coord_constant # y_rel_coord (1, num_insts, h, w)
x_rel_coord = (x_grid[None, None, ...] - (x[None, ..., None, None] - mask_stride // 2) / mask_stride) / self.coord_constant # x_rel_coord (1, num_insts, h, w)
mask_feat_coord_cats = []
for j in range(ins_num):
mask_feat_coord_cat = torch.cat([mask_feat, x_rel_coord[:, [j], :, :], y_rel_coord[:, [j], :, :]], dim=1)
mask_feat_coord_cats.append(mask_feat_coord_cat)
mask_feat_coord_cats = torch.cat(mask_feat_coord_cats, dim=1)
conv1 = F.conv2d(mask_feat_coord_cats, weights1, bias1, groups=ins_num).relu()
conv2 = F.conv2d(conv1, weights2, bias2, groups=ins_num).relu()
masks_per_image = F.conv2d(conv2, weights3, bias3, groups=ins_num)
masks_per_image = aligned_bilinear(masks_per_image, self.strides[0])[0].sigmoid()
for j in range(ins_num):
ind = inds[j]
mask_gt = masks_t[i][matched_idxes[ind]].float()
mask_pred = masks_per_image[j]
mask_loss += self.dice_loss(mask_pred, mask_gt)
@txytju CondInst has been released.
Hi~ @tianzhi0549 I am trying to implement to rel. coord. in the CondInst.
Am I right? Could you provide the official code snippet of rel. coord.? Thanks!