airctic / icevision

An Agnostic Computer Vision Framework - Pluggable to any Training Library: Fastai, Pytorch-Lightning with more to come
https://airctic.github.io/icevision/
Apache License 2.0
845 stars 149 forks source link

Adding a maskrcnn_resnet50_fpn_v2 from TorchVision models #1167

Open medphisiker opened 1 year ago

medphisiker commented 1 year ago

🚀 Feature

Hello.

there is new intresting version of Masked RCNN model in TorchVision (link). maskrcnn_resnet50_fpn_v2 - Improved Mask R-CNN model with a ResNet-50-FPN backbone from the Benchmarking Detection Transfer Learning with Vision Transformers paper.

maskrcnn_resnet50_fpn_v2 model gives effective increase(link) for MS COCO metric in comparision with classic maskrcnn_resnet50_fpn.

image

I see some examples of fine tuning. The code for fine tuning maskrcnn_resnet50_fpn_v2 and maskrcnn_resnet50_fpn are identical. Ice Vision framework has support for classic TorchVision's maskrcnn_resnet50_fpn fine tuning. It will be great if Ice Vision framework also has support for new TorchVision's maskrcnn_resnet50_fpn_v2.

Describe the solution you'd like It will be great if Ice Vision framework also has support for new TorchVision's maskrcnn_resnet50_fpn_v2. Also there is an updated version of the these detectors, - FasterRCNN_ResNet50_FPN_V2 and RetinaNet_ResNet50_FPN_V2.

image

Describe alternatives you've considered Currently, we already have many excellent neural networks for detection in the Ice Vision framework. But it is worth noting that Faster and Masked RCN are multi-stage detectors. Most of the more accurate detectors presented in the framework are single-stage. In one competition, I used YOLOv7, which had a higher metric on MS COCO for detection (53). But the competitors that used the classic multistage Faster R-CNN won that gives only 37. It turned out that on a dataset with crowded objects, Faster RCNN works better than a single-stage YOLOv7, even though there is a big difference in metrics on MS COCO in the YOLOv7 slider.