dbolya / yolact

A simple, fully convolutional model for real-time instance segmentation.
MIT License
5.02k stars 1.32k forks source link

Question about performance. #226

Closed Markusgami closed 4 years ago

Markusgami commented 4 years ago

Thank you for the novel idea for instance segmentation. It's great! Mask RCNN like methods is really not elegent. Yet I have a question. In the paper, you explain the low mAP on mask is largely due to the low mAP of the box. Have you ever done some experiment to prove this? Like I can just use ground truth box in test to see the mask mAP.. Or try some sota detector on this idea. Improvement on single stage detector is really big this year..

Markusgami commented 4 years ago

My fault.. You can't use gt box in test..becuase gt box have no Mask Coefficients... But I'm still curious about the performance if trained with a sota detector..

dbolya commented 4 years ago

We haven't been able to try any current SotA detectors so far because they're all really slow and people aren't able to reproduce for instance YOLOv3 results in Pytorch. Did you have any specific, fast detector in mind? It would be a great subject for future research.

ljjyxz123 commented 4 years ago

We haven't been able to try any current SotA detectors so far because they're all really slow and people aren't able to reproduce for instance YOLOv3 results in Pytorch. Did you have any specific, fast detector in mind? It would be a great subject for future research.

@dbolya I have an idea, but don't know whether it works or not. I think we can train the bounding box detector, mask coefficients, and prototypes separately. For example, if we want the bounding box detector to work fine, we can use a good detector as backbone. If we want the prototypes and mask coefficients to be better, while in the crop stage, we can use ground truth bounding box to supervise it, because the ground truth bounding box is more precise, and it will help the cropping stage more precise.

Just an idea, need to be checked. Do you think this idea will work?

feiyuhuahuo commented 4 years ago

@dbolya EfficientDet is quite fast, the BiFPN from it looks cool. But seems there hasn't been a PyTorch implementation yet.

dbolya commented 4 years ago

@ljjyxz123 We already train masks with the gt bounding box crop instead of the predicted one (predicted on does slightly worse and takes longer to converge as expected). And we can't really train a separate detector for the sole purpose of producing boxes, since the box and masks branches need to be coupled so that they agree on the same instances.

@feiyuhuahuo EfficientDet looks really good wow. The BiFPN should be fairly easy to implement too. I'll have to take a detailed look into this later. Thanks for bringing it to my attention!

ljjyxz123 commented 4 years ago

@ljjyxz123 We already train masks with the gt bounding box crop instead of the predicted one (predicted on does slightly worse and takes longer to converge as expected). And we can't really train a separate detector for the sole purpose of producing boxes, since the box and masks branches need to be coupled so that they agree on the same instances.

@feiyuhuahuo EfficientDet looks really good wow. The BiFPN should be fairly easy to implement too. I'll have to take a detailed look into this later. Thanks for bringing it to my attention!

@dbolya Thanks, I get your idea. The mask coefficient branch, protonet, and bbox prediction branch share the unified backbone, so that the parameters in the backbone are shared, and train them at the same time will balance the weights in the backbone to work out best.

abhigoku10 commented 4 years ago

@dbolya @fanyix though the current idea implementation is really great , but when we use different backbones like efficient net , resxnet should we modify the position of p5,p6 layers considered

Markusgami commented 4 years ago

@dbolya ah...Thanks for letting me know people tried yolov3 on your idea. I’m trying to get a fast instance segmentation recently. There was another work Retinamask based their work on Retinanet like you. But they took a mask rcnn like approach. Just this month, another work called centermask achieved much better performance by using a better detector and better mask head. I noticed in both works their box map increased a little by adding instance head. But I find yolact box map is like 3 point lower than original detector. Yolcat would be a fantastic algorithm if box map is same as base detector

dbolya commented 4 years ago

@abhigoku10 I don't think any modification is necessary as long as you take the relevant c3, c4, c5 from similar depths in the network.

@Markusgami Sorry, I didn't mean that people have tried YOLACT with YOLOv3, I meant that implementations of YOLOv3 in Pytorch can't reach the level of performance as its original implementation in Darknet. To my knowledge, nobody has yet tried to combine YOLOv3 and YOLACT.

And yeah, Retinamask is super slow which is why we didn't consider it. I assume Centermask is also slow? We don't exactly use Retinanet as a detector backbone, just a stripped down version of it. However, we also can't reproduce Retinanet's performance in this code base even when using the full version so there's definitely room for improvement.

feiyuhuahuo commented 4 years ago

@dbolya Here is a new implementation of yolov3 in PyTorch:yolov3+ASFF. The author improved yolov3 and get a baseline model. Even the baseline model get a nice performance. Maybe this is helpful.

dbolya commented 4 years ago

@feiyuhuahuo Oooh nice, I'll play around with it if I get time. Should be pretty easy to implement YOLACT in there.

abhigoku10 commented 4 years ago

@Markusgami @dbolya @feiyuhuahuo as menitoned in one of the comments EfficientDet , the main advantage is of that architecture is BiFPN and Compound scale fusion , do you think by Replacing the FPN with biFPN will get us better accuracy compared with yolact++

dbolya commented 4 years ago

@abhigoku10 It probably would, and I'll definitely try it for YOLACTv2.

Markusgami commented 4 years ago

@abhigoku10 Sure. And the new backbone is worth trying too. I replaced resnet50 with efficientnet b3 which almost same speed in tensorRT feed forward but much much stronger

lucasjinreal commented 4 years ago

@dbolya Any news about this thread? Any updates on YOLACTv2 ? Really hope to see a realtime while also accurate instance segmetation model. From this timeline, Yolov5 is out and has the best performance ever due to it's high level engineering and some ticks. Currently lots of SOTA detector can be chosen and concat with YOLACT, does anybody tried out any model?