facebookresearch / detectron2

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
https://detectron2.readthedocs.io/en/latest/
Apache License 2.0
30.66k stars 7.51k forks source link

torchvision vs detectron2 #117

Closed aminekechaou closed 5 years ago

aminekechaou commented 5 years ago

Hi, are there differences between models implemented in torchvision vs the ones implemented in detectron2 (resnet, fpn, faster-r-cnn...) ?

There seems to be a big overlap between both projects (models, structures, utils). Is there an intention to consolidate both projects, or at least a general recommendation as to what to use?

xksteven commented 5 years ago

Here's my current understanding of the differences:

Torchvision use case is geared more for data processing specifically in images and videos. They also provide commonly trained vision models that one can use to fine tune or do other things with.

Detectron2 provides an entire pipeline from ingesting the data (utilizing torchivision) to training to evaluation to visualization for some vision tasks such as detection, panoptic segmentation, and key point detection.

Since the models used in Detectron2 are more use case specific I do not imagine all of them will be made available in torchvision but some of them probably will. So to answer your question if you need to process your data or use a model in torchvision then just use that. If you need a larger framework to do a detection task then Detectron2 provides a lot of the leg work for you.

rbgirshick commented 5 years ago

@xksteven's summary is good. A few thoughts to add to it:

Torchvision provides established, reusable components. Detectron2 depends on torchvision and currently makes use of several of those components. We expect detectron2 to make more use of torchvision as time goes on, with the goal of reducing redundancy whenever it makes sense.

Torchvision also provides reference implementations for some higher-level "components", like Mask R-CNN. With respect to these, in my view the two codebases are optimized for different use cases.

For these reference models, torchvision makes some common use cases simple, e.g., running a standard, already trained model on an image. It is intentionally designed to be lightweight and simple in this regard. The tradeoff is that it doesn't have some features like a flexible configuration system for controlling experiments or a registration mechanism for defining new system components while also reusing parts of existing models. Detectron2 is designed foremost with the research use-case in mind and so features like these are essential, even though they bring some additional code complexity.

Detectron2 will continue to evolve with the goal of making research use-cases more productive, and easing the research-to-production path. We expect to make more use of torchvision and at some points see low-level, reusable components that were initially developed in detectron2 migrate to fvcore, and then eventually to torchvision if they are of quite general use.

@fmassa might have some additional thoughts or refinements of my points.

aminekechaou commented 5 years ago

Thanks for the clarifications @xksteven @rbgirshick !