aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.37k stars 646 forks source link

Attempt to Reproduce the Results of CondInst. #39

Closed Yuxin-CV closed 4 years ago

Yuxin-CV commented 4 years ago

Hi~ @tianzhi0549 I want to make sure the shared head architecture of CondInst. Design A

                 --- conv --- conv --- conv --- conv --- cls_pred 
                |       
                |                                        --- ctr_pred 
                |                                       |
FPN features --- --- conv --- conv --- conv --- conv --- --- reg_pred 
                |
                |
                |
                 --- conv --- conv --- conv --- conv --- controller_pred

Design B

                 --- conv --- conv --- conv --- conv --- cls_pred 
                |       
                |                                        --- ctr_pred 
                |                                       |
FPN features --- --- conv --- conv --- conv --- conv --- --- reg_pred 
                                                        |
                                                         --- controller_pred

Which one is right? I found Design B will degradation Box AP and mask AP is also very low. Here is my results for MS-R-50_1x.

Box AP AP AP50 AP75
38.269 57.210 55.405
Mask AP AP AP50 AP75
27.531 51.157 47.783

The Box AP should be higher than 39.5 for MS training(~39.5) & multi-task training(+~1.0). So I think Design B is wrong. It is hard for one branch to handle 3 preds, and the grad from controller_pred degenerate the reg_pred.

tianzhi0549 commented 4 years ago

@Yuxin-CV Unfortunately, we use design B ...

Yuxin-CV commented 4 years ago

@Yuxin-CV Unfortunately, we use design B ...

Thanks for your reply. I think it is better to make it clear in the paper. Becaue as mentioned in the paper, there are 3 heads: Classification Head, Controller Head & Center-ness Head, and I really can't figure out how they organized from the paper...

Yuxin-CV commented 4 years ago

BTW, will the code of CondInst be released recently? I found it is really hard for me to reproduce the results...

tianzhi0549 commented 4 years ago

@Yuxin-CV Our paper is in submission, so we won't release the code until our paper gets accepted. You may refer to other implementation of CondInst.

Yuxin-CV commented 4 years ago

@Yuxin-CV Our paper is in submission, so we won't release the code until our paper gets accepted. You may refer to other implementation of CondInst.

Thanks for your reply.

Yuxin-CV commented 4 years ago

I am also interested in the loss behavior of FCOS part in CondInst, e.g., cls_loss & reg_loss. Compared with FCOS, is the loss become higher or lower in CondInst? The information is quite helpful for me to debug. Thanks!

tianzhi0549 commented 4 years ago

@Yuxin-CV I cannot find the log files now, but I think the detector's losses should be lower because of the improved detection performance.

Yuxin-CV commented 4 years ago

@Yuxin-CV I cannot find the log files now, but I think the detector's losses should be lower because of the improved detection performance.

Thanks for your suggestion~ I will check my code.

Yuxin-CV commented 4 years ago

@Yuxin-CV Our paper is in submission, so we won't release the code until our paper gets accepted. You may refer to other implementation of CondInst.

Hi~ @tianzhi0549 My implementation of CondInst is built upon https://github.com/Epiphqny/CondInst, and I find a bug in this codebase:

https://github.com/Epiphqny/CondInst/blob/1b03b70ea6c71f0e951ed2771ad16a24515d4c3c/fcos/modeling/fcos/fcos_outputs.py#L322

The locations's N dim and L dim are not transposed, so this when the batch_size_per_GPU > 1, the implementation of rel. coord. is wrong and this will degenerate the results. I fix this bug and I get improved results compared with my previous results: https://github.com/aim-uofa/AdelaiDet/issues/39#issue-604484494

Resolution of Mask Prediction Box AP Mask AP
1 / 8 39.5 33.0 (-1.4)
1 / 4 39.5 31.7 (-4.1)
1 / 2 39.5 34.5 (-1.2)

The results in different Resolution of Mask Prediction shows similar & reasonable Box AP (39.5), but the Mask AP is abnormal, especially for the 1/4 resolution case. So I think at least there are some problems in the alignment of mask feature during training (I use thealigned_bilinear as you @tianzhi0549 mentioned in https://github.com/Epiphqny/CondInst/issues/1) & the postprocessing. To avoid the feature alignment issue and study the postprocessing part of my code, I focus on the 1/8 resolution case and I modify the postprocessing code in

https://github.com/Epiphqny/CondInst/blob/ea3f717fce73a8e4c273f1379c9d9c3550387e1b/fcos/modeling/fcos/fcos_outputs.py#L391-L392

to

masks = masks_per_image[0, :, :self.image_sizes[0][0] // 8, :self.image_sizes[0][1] // 8].sigmoid()

The masks then rescale to the original image resolution (using F.interpolate). This gives 0.6 boost in Mask AP (33.6), but there is still 0.8 AP gap for the 1/8 resolution case.

Also, there is a 0.2 AP gap for Box(https://github.com/aim-uofa/AdelaiDet/issues/20#issuecomment-610162815). This indicates that there must be some problems in the training code, probably in the feature alignment of mask prediction & GT (but I already use the aligned_bilinear during training...) or in the GT preparation process(most probably, I think...).

So I wonder could you @tianzhi0549 please help me with the above problems? Can you provide some detail information or some code snippets of the feature alignment, postprocessing & the GT preparation process? I really need your help...

tianzhi0549 commented 4 years ago

@Yuxin-CV We have released the code of BlendMask. CondInst is implemented with the same codebase. I think it should be helpful to you. Also, a hint is if the performance degradation is due to misalignment, you should see much more performance degradation on small objects than on large objects.

Yuxin-CV commented 4 years ago

@Yuxin-CV We have released the code of BlendMask. CondInst is implemented with the same codebase. I think it should be helpful to you. Also, a hint is if the performance degradation is due to misalignment, you should see much more performance degradation on small objects than on large objects.

Thanks for your suggestions. @tianzhi0549 I mod my code and I get the improved results.

Resolution of Mask Prediction Box AP Mask AP APs APm APl
1 / 8 39.5 33.6 (-0.8) 14.7 (-0.4) 37.9 (0.5) 49.0 (-1.8)
1 / 4 39.5 34.7 (-1.0) 16.2 (-0.8) 38.3 (-1.0) 49.5 (-1.6)
1 / 2 39.5 (-0.2) 35.0 (-0.7) 17.0 (-0.1) 38.6 (-0.5) 49.3 (-0.9)

It seems that now the bottleneck is not APs (misalignment) for the 1 / 2 case. For now,

  1. The performance gap in APl & APm is relatively large (0.9 & 0.5).
  2. There is still a 0.2 gap in Box AP.

Could you please give me some suggestions?

Yuxin-CV commented 4 years ago

BTW, I want to make sure that:

  1. For the structure of the mask branch, I use Design B, is it right?
Design A: P3 feature -> Conv(256, 128) -> 4 x Conv(128, 128) -> Conv(128, 8)
Design B: P3 feature -> Conv(256, 128) -> 3 x Conv(128, 128) -> Conv(128, 8)
  1. The mask feature is upsampled after the mask FCN head, before computing the Dice Loss, is it right?

Thanks! @tianzhi0549

tianzhi0549 commented 4 years ago

@Yuxin-CV 1) The mask branch should be similar to the basis module in BlendMask. But we do not upsample the feature maps from 8x to 4x here. I don't think these design choices of the mask branch are critical. 2) Yes.

Yuxin-CV commented 4 years ago

@Yuxin-CV 1) The mask branch should be similar to the basis module in BlendMask. But we do not upsample the feature maps from 8x to 4x here. I don't think these design choices of the mask branch are critical. 2) Yes.

Thanks for your reply! Could you give me some suggestions for the issue mentioned in https://github.com/aim-uofa/AdelaiDet/issues/39#issuecomment-619559827?

Yuxin-CV commented 4 years ago

@Yuxin-CV 1) The mask branch should be similar to the basis module in BlendMask. But we do not upsample the feature maps from 8x to 4x here. I don't think these design choices of the mask branch are critical. 2) Yes.

Hi~ @tianzhi0549, thanks for your reply. I wonder what kind of activation function, normalization layer & initialization method you use in the CondInst mask branch. It is not mentioned in the paper.

sxhxliang commented 4 years ago

@Yuxin-CV We have released the code of BlendMask. CondInst is implemented with the same codebase. I think it should be helpful to you. Also, a hint is if the performance degradation is due to misalignment, you should see much more performance degradation on small objects than on large objects.

Thanks for your suggestions. @tianzhi0549 I mod my code and I get the improved results.

Resolution of Mask Prediction Box AP Mask AP APs APm APl 1 / 8 39.5 33.6 (-0.8) 14.7 (-0.4) 37.9 (0.5) 49.0 (-1.8) 1 / 4 39.5 34.7 (-1.0) 16.2 (-0.8) 38.3 (-1.0) 49.5 (-1.6) 1 / 2 39.5 (-0.2) 35.0 (-0.7) 17.0 (-0.1) 38.6 (-0.5) 49.3 (-0.9) It seems that now the bottleneck is not APs (misalignment) for the 1 / 2 case. For now,

  1. The performance gap in APl & APm is relatively large (0.9 & 0.5).
  2. There is still a 0.2 gap in Box AP.

Could you please give me some suggestions? I get low mask APs,Can you share the experiment log? [05/04 09:11:08 d2.evaluation.testing]: copypaste: 39.4628,58.8626,42.7325,23.9317,42.9279,50.3428 [05/04 09:11:08 d2.evaluation.testing]: copypaste: Task: segm [05/04 09:11:08 d2.evaluation.testing]: copypaste: AP,AP50,AP75,APs,APm,APl [05/04 09:11:08 d2.evaluation.testing]: copypaste: 32.8535,55.2280,33.6173,13.3537,36.2689,49.4594 [05/04 0