facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.01k stars 7.41k forks source link

SSD implementation #456

Open ArutyunovG opened 4 years ago

ArutyunovG commented 4 years ago

Thank you for this awesome work! Clearly written with enthusiasm and dedication to do the best detection ecosystem.

Got two questions, while reading detectron2 sources

1) Can RetinaNet be a subclass of general single-stage meta-architecture? That way implementation of both SSD and RetinaNet can probably come naturally without duplication of code, as well as other single stage anchor-based detector.

2) Can matcher be implemented as a base class and currently-implemented Faster R-CNN matching strategy be subclassed? I'm wondering about implementing OHEM as a subclass of matcher along-side with the current matching strategy.

Once again, thank you for the work.

ppwwyyxx commented 4 years ago
  1. It could, but it's unclear whether there is any benefits since we don't have other single-stage detector now. It's unclear if there is anything nontrivial and sharable between a RetinaNet and an SSD.

  2. I'm not sure which matching strategy you're referring to as there are many. In general you can always subclass the lowest-level enclosing module of the part that you want to modify.

ArutyunovG commented 4 years ago

@ppwwyyxx I'm going to check the difference between RetinaNet and SSD by converting the model from original Caffe implementation by Wei Liu to RetinaNet implementation in detectron 2. From RetinaNet implementation code in detectron2 it seems, that the only thing needed to change is to add an option of not sharing the same retina head on different feature levels. In SSD those heads are different. Everything else is the same. If that's the case to obtain SSD from RetinaNet we just need some few lines of code to support option of sharing/not-sharing RetinaHead. But I need to check it, it will take few days, don't close this meanwhile please.

ArutyunovG commented 4 years ago

@ppwwyyxx

Please, take a look at this diff. It transforms RetinaNet to SSD, when RETINANET.SHARED_HEAD == False.

That said, I found the following technical details, which are different from original implementation of SSD. While I don't found them crucial, for clarity they are worth mentioning

1) SSD in Caffe uses Softmax with catch-all background class for box_cls, while RetinNet in detectron2 uses Sigmoid without background class 2) SSD centers anchors in pixel center, while anchor_generator in detectron2 aligns them to top-left (i.e. in detectron2 they are shifted by stride / 2 to their natural position) 3) SSD in Caffe implementation uses RETINANET.NUM_CONVS == 0 in heads. This is rather an available option than difference, though 4) SSD uses a slightly different anchor generator

All of that said, personally I don't see a problem. Say, otherwise RPN in detectron2 is not an RPN, because it has the same issues 1, 2, 4 compared to original implementation, but we don't find problems calling RPN in detectron2 an RPN, since it is conceptually still the same RPN.

What do you think about incorporating this parameter RETINANET.SHARED_HEAD to allow users training SSD in a natural way without duplication of code, or may be you are not interested in SSD?

ppwwyyxx commented 4 years ago

We would love to have SSD but under the right abstraction. Since SSD is not A RetinaNet, we shouldn't implement it as an option of RetinaNet but should give it its own meta architecture. It is the same reason why RetinaNet is not an RPN (as in maskrcnn-benchmark) even though some options can be added to achieve it.

Regarding the other differences, we do need all implementations to match the originals as close as possible, unless there are reasonable arguments that the difference either doesn't matter or is an improvement.

ArutyunovG commented 4 years ago

We can divide the process of porting SSD implementation into two parts: implementation of 1) inference and 2) training.

So this is what we have at the moment with inference.

I converted the original model SSD300x300. It is based on modified VGG16.

This is the link to the current state, from which we can perform refactoring of the SSD inference to a form you find acceptable.

To download weights and be able to run demo/demo.py this is the converted weights link.

What kind of refactoring would you suggest? I've intentionally done minimum changes to copy-pasted RetinaNet code, so it can be merged with SSD (while having two separate meta-architectures) into some base nn.Module, if needed.

ArutyunovG commented 4 years ago

Implementation of OHEM (second question) is part of the original SSD training approach.

jakobamb commented 4 years ago

Hey guys, thanks for the awesome work. Any updates regarding SSD?

Sharathmk99 commented 4 years ago

+1

radu-diaconescu13 commented 4 years ago

I'm also interested in a SSD implementation

cognitiveRobot commented 4 years ago

I am also interested. Thanks.

hiyyg commented 4 years ago

Hi @ppwwyyxx, @ArutyunovG , maybe there is a reference implementation: https://github.com/Megvii-BaseDetection/BorderDet/blob/master/cvpods/modeling/meta_arch/ssd.py.

BTW, I am also interested in having a yolo implementation in detectron2, since yolov4 has been shown to be very accurate and fast. (ref. https://github.com/Megvii-BaseDetection/BorderDet/blob/master/cvpods/modeling/meta_arch/yolov3.py)

ArutyunovG commented 2 years ago

I deleted SSD implementation and updated it for personal usage. The only think common with original SSD is the meta-arch, though, you can use your own backbones, augs and stuff. This is the link, in case someone stays interested