Epiphqny / CondInst

Conditional Convolutions for Instance Segmentation, achives 37.1mAP on coco val
https://arxiv.org/abs/2003.05664
139 stars 15 forks source link

On the bilinear in your implementation. #1

Closed tianzhi0549 closed 4 years ago

tianzhi0549 commented 4 years ago

https://github.com/Epiphqny/CondInst/blob/4a519c12b7be83f86b3d75c62cf3a87a9dec31a7/fcos/modeling/fcos/fcos_outputs.py#L366 The default bilinear in PyTorch is not aligned, which might much degrade the performance, in particular for small objects.

Please try the aligned bilinear.

def aligned_bilinear(tensor, factor):
    assert tensor.dim() == 4
    assert factor >= 1
    assert int(factor) == factor

    if factor == 1:
        return tensor

    h, w = tensor.size()[2:]
    tensor = F.pad(tensor, pad=(0, 1, 0, 1), mode="replicate")
    oh = factor * h + 1
    ow = factor * w + 1
    tensor = F.interpolate(
        tensor, size=(oh, ow),
        mode='bilinear',
        align_corners=True
    )
    tensor = F.pad(
        tensor, pad=(factor // 2, 0, factor // 2, 0),
        mode="replicate"
    )

    return tensor[:, :, :oh - 1, :ow - 1]
Epiphqny commented 4 years ago

@tianzhi0549 Thanks for pointing out it, i will try and update the new result later.

tianzhi0549 commented 4 years ago

@Epiphqny I also note that it seems you are using absolute coordinates as the input to the mask heads, which is not correct. It is important to use relative coordinates here because we hope the generated filters are position-independent.

Epiphqny commented 4 years ago

@tianzhi0549 The coordinates in this implementation is ranged from -1 to 1, what do you mean by relative coordinates, should it be 0-1 instead?

tianzhi0549 commented 4 years ago

@Epiphqny https://github.com/aim-uofa/AdelaiDet/issues/10. You can refer to the explanation here.

Epiphqny commented 4 years ago

@tianzhi0549 Ok, i will try that.

Epiphqny commented 4 years ago

@tianzhi0549 It sounds that the relative coordinates is in some way like the center-ness...but implements in different approach, just my opinion.

tianzhi0549 commented 4 years ago

@tianzhi0549 They may be similar in some aspects, but they are designed for totally different purposes ...

Epiphqny commented 4 years ago

@tianzhi0549 Yes, both are interesting ideas!

Epiphqny commented 4 years ago

@tianzhi0549 Hi, i replaced the original upsample with the aligned version and used the upsampled mask to calculate loss, now the AP is 37.1. But this is still the absolute coordinate version, i will update new results after the training of relative coordinate version finished.

tianzhi0549 commented 4 years ago

@Epiphqny Great! For the memory usage issue, you could limit the maximum number of samples used to compute masks during training. Using relative coordinates might also much boost the performance.

Epiphqny commented 4 years ago

@tianzhi0549 Perhaps there is some problem in my implementation of relative coordinates, it only achieves 36.9 mAP, which is worse than the absolute coordinate version.

tianzhi0549 commented 4 years ago

@Epiphqny if possible, you can push your code to a new branch of the repo. I can help check it.

Epiphqny commented 4 years ago

Hi @tianzhi0549, i have add the code in the relative_coordinate branch, thank you very much for the help!

tianzhi0549 commented 4 years ago

@Epiphqny Are you sure this line is correct? https://github.com/Epiphqny/CondInst/blob/1b03b70ea6c71f0e951ed2771ad16a24515d4c3c/fcos/modeling/fcos/fcos_outputs.py#L591

Yuxin-CV commented 4 years ago

@Epiphqny Hi~Thanks for sharing your code! It seems that the setting of IMS_PER_BATCH and BASE_LR in your config is incorrect. https://github.com/Epiphqny/CondInst/blob/ea3f717fce73a8e4c273f1379c9d9c3550387e1b/configs/CondInst/Base-FCOS.yaml#L17-L18 IMS_PER_BATCH and BASE_LR should be changed together according to Linear Scaling Rule, you need to set the learning rate proportional to the batch size if you use different GPUs or images per GPU, e.g., IMS_PER_BATCH = 4 & BASE_LR = 0.0025.

I also find the similar problem in your YOLACT_FCOS repo: https://github.com/Epiphqny/Yolact_fcos/blob/b131542a930499523343d3fd660088e7e372c317/configs/Yolact/Base-FCOS.yaml#L16-L18

Though changing IMS_PER_BATCH and BASE_LR according to Linear Scaling Rule cannot guarantee to reproduce the results in the paper, but I think it can help you obtain very close result. @tianzhi0549 @Epiphqny

Epiphqny commented 4 years ago

@Yuxin-CV Thank you very much for pointing out that, i will try the Linear Scaling Rule later.

Epiphqny commented 4 years ago

@tianzhi0549 sorry, can not find the problem in this line, could you point out it directly?

tianzhi0549 commented 4 years ago

@Epiphqny I would suggest that you compute all the coordinate transformation on the scale of the input image. After you get the final relative coordinates, you can normalize them by a constant scale. Please make sure even after normalization, the locations generating the filters should always be at (0, 0).

Epiphqny commented 4 years ago

@tianzhi0549 I have subtracted the center coordinate in https://github.com/Epiphqny/CondInst/blob/1b03b70ea6c71f0e951ed2771ad16a24515d4c3c/fcos/modeling/fcos/fcos_outputs.py#L600 , and the values of center locations are zero.

Yuxin-CV commented 4 years ago

@Yuxin-CV Thank you very much for pointing out that, i will try the Linear Scaling Rule later.

Personally, I think you can try R-50 1x lr_schedule with input_size = 800, batch_size = 16 first before using stronger backbone and longer lr_schedule. You can get the results in less than 1 day if you have access to 4 or 8 GPU. Looking forward to your result! @Epiphqny

Yuxin-CV commented 4 years ago

BTW, I wonder how you @tianzhi0549 implement the forward_mask() part in the official code? Do you simplely use a for loop just like @Epiphqny's implementation: https://github.com/Epiphqny/CondInst/blob/ea3f717fce73a8e4c273f1379c9d9c3550387e1b/fcos/modeling/fcos/fcos_outputs.py#L585-L607 or some other highly optimized implementation, e.g., a CUDA kernel?

Yuxin-CV commented 4 years ago

Hi~@Epiphqny I also find that the mask_loss's normalization factor N_pos in your code is not reduced. https://github.com/Epiphqny/CondInst/blob/4a519c12b7be83f86b3d75c62cf3a87a9dec31a7/fcos/modeling/fcos/fcos_outputs.py#L581-L582 I think it is better to use num_pos_avg as the normalization factor, which is the average of all the positive samples across different GPUs. https://github.com/Epiphqny/CondInst/blob/4a519c12b7be83f86b3d75c62cf3a87a9dec31a7/fcos/modeling/fcos/fcos_outputs.py#L504-L508

mask_loss = mask_loss / num_pos_avg
Yuxin-CV commented 4 years ago

@tianzhi0549 I have subtracted the center coordinate in

https://github.com/Epiphqny/CondInst/blob/1b03b70ea6c71f0e951ed2771ad16a24515d4c3c/fcos/modeling/fcos/fcos_outputs.py#L600

, and the values of center locations are zero.

@Epiphqny I think the rel. coord. should be location specific, just like:

For location (x, y) on input_img:
    x_range = torch.arange(W_mask)
    y_range = torch.arange(H_mask)
    y_grid, x_grid = torch.grid(y_range, x_range)
    y_rel_coord = (y_grid – y / mask_stride).normalize_to(-1, 1)
    x_rel_coord = (x_grid – x / mask_stride).normalize_to(-1, 1)
    rel_coord = torch.cat(x_rel_coord, y_rel_coord)

@tianzhi0549 Am I right? Could you provide the official code snippet of rel. coord.? Thanks!

Epiphqny commented 4 years ago

@Yuxin-CV Please modify the code and train the model, then report the result here. I will update if there is improvement. I don't have extra GPU to train the model now.

Yuxin-CV commented 4 years ago

@Yuxin-CV Please modify the code and train the model, then report the result here. I will update if there is improvement. I don't have extra GPU to train the model now.

OK

tianzhi0549 commented 4 years ago

@Epiphqny For your information. https://github.com/aim-uofa/AdelaiDet/issues/23#issuecomment-611870073. Thank you:-).

Epiphqny commented 4 years ago

@tianzhi0549 Ok, thanks for providing the code.

guangdongliang commented 3 years ago

@tianzhi0549 I got the same result in your docker using "aligned_bilinear" and "F.interpolate" ! image

chufengt commented 3 years ago

@tianzhi0549 One question about aligned_bilinear: I noticed that other interpolation operations in detectron2 and adet required align_corners =False (e.g. image and mask resize). Should we change other align_corners to True when using CondInst? Thanks.