aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.38k stars 650 forks source link

On the rel. coord. of CondInst #23

Open Yuxin-CV opened 4 years ago

Yuxin-CV commented 4 years ago

Hi~ @tianzhi0549 I am trying to implement to rel. coord. in the CondInst.

For location of interest (x, y) on the input image:
    x_range = torch.arange(W_mask)
    y_range = torch.arange(H_mask)
    y_grid, x_grid = torch.grid(y_range, x_range)
    y_rel_coord = (y_grid – y / mask_stride).normalize_to(-1, 1)
    x_rel_coord = (x_grid – x / mask_stride).normalize_to(-1, 1)
    rel_coord = torch.cat(x_rel_coord, y_rel_coord)

Am I right? Could you provide the official code snippet of rel. coord.? Thanks!

tianzhi0549 commented 4 years ago

@Yuxin-CV I don't think it is correct to normalize the coordinates to (-1, 1), which implies that (x, y) is the center of the map. But it is not always true. Just scaling down the coordinates by a constant scale (e.g., 400) is fine.

Just to be clear. The correct code should be like,

For location of interest (x, y) on the input image:
    x_range = torch.arange(W_mask)
    y_range = torch.arange(H_mask)
    y_grid, x_grid = torch.grid(y_range, x_range)
    y_rel_coord = (y_grid – (y - mask_stride // 2) / mask_stride) / 400.0
    x_rel_coord = (x_grid – (x - mask_stride // 2) / mask_stride) / 400.0
    rel_coord = torch.cat(x_rel_coord, y_rel_coord)

Note that mapping (x, y) from the input to feature maps should be ((x - stride/2) / stride, (y - stride/2) / stride).

Yuxin-CV commented 4 years ago

Thanks for your prompt reply. @tianzhi0549

txytju commented 4 years ago

Hello @tianzhi0549 , I have implemented rel-coord according to your hint above. However, the AP performance droped by 3. Would you like to check the correctness of my code? Thanks.

for CondInst

N, C, h, w = self.masks.shape mask_stride = self.strides[0] x_range = torch.arange(w) y_range = torch.arange(h) y_grid, x_grid = torch.meshgrid(y_range, x_range) y_grid = y_grid.to(self.masks.device) x_grid = x_grid.to(self.masks.device)

    r_h = int(h * self.strides[0])
    r_w = int(w * self.strides[0])
    targets_masks = [target_im.gt_masks.tensor for target_im in self.gt_instances]
    masks_t = self.prepare_masks(h, w, r_h, r_w, targets_masks)
    mask_loss = self.masks[0].new_tensor(0.0)
    batch_ins = im_idxes.shape[0]

    # for each image
    for i in range(N):
        inds = (im_idxes == i).nonzero().flatten()
        ins_num = inds.shape[0]
        if ins_num > 0:
            controllers = controllers_pred[inds]

            weights1 = controllers[:, :80].reshape(-1, 8, 10).reshape(-1, 10).unsqueeze(-1).unsqueeze(-1)
            bias1 = controllers[:, 80:88].flatten()
            weights2 = controllers[:, 88:152].reshape(-1, 8, 8).reshape(-1, 8).unsqueeze(-1).unsqueeze(-1)
            bias2 = controllers[:, 152:160].flatten()
            weights3 = controllers[:, 160:168].unsqueeze(-1).unsqueeze(-1)
            bias3 = controllers[:, 168:169].flatten()

            mask_feat = self.masks[None, i]
            location = locations[inds]
            x = location[:, 0]
            y = location[:, 1]
            y_rel_coord = (y_grid[None, None, ...] - (y[None, ..., None, None] - mask_stride // 2) / mask_stride) / self.coord_constant  # y_rel_coord (1, num_insts, h, w)
            x_rel_coord = (x_grid[None, None, ...] - (x[None, ..., None, None] - mask_stride // 2) / mask_stride) / self.coord_constant  # x_rel_coord (1, num_insts, h, w)
            mask_feat_coord_cats = []
            for j in range(ins_num):
                mask_feat_coord_cat = torch.cat([mask_feat, x_rel_coord[:, [j], :, :], y_rel_coord[:, [j], :, :]], dim=1)
                mask_feat_coord_cats.append(mask_feat_coord_cat)
            mask_feat_coord_cats = torch.cat(mask_feat_coord_cats, dim=1)

            conv1 = F.conv2d(mask_feat_coord_cats, weights1, bias1, groups=ins_num).relu()
            conv2 = F.conv2d(conv1, weights2, bias2, groups=ins_num).relu()

            masks_per_image = F.conv2d(conv2, weights3, bias3, groups=ins_num)
            masks_per_image = aligned_bilinear(masks_per_image, self.strides[0])[0].sigmoid()
            for j in range(ins_num):
                ind = inds[j]
                mask_gt = masks_t[i][matched_idxes[ind]].float()
                mask_pred = masks_per_image[j]
                mask_loss += self.dice_loss(mask_pred, mask_gt)
tianzhi0549 commented 4 years ago

@txytju CondInst has been released.