facebookresearch / maskrcnn-benchmark

Fast, modular reference implementation of Instance Segmentation and Object Detection algorithms in PyTorch.
MIT License
9.3k stars 2.49k forks source link

Advantages compared to tensor-flow version Mask-RCNN #449

Open YubinXie opened 5 years ago

YubinXie commented 5 years ago

❓ Questions and Help

I am curious that what is the advantages of this pytorch version Mask-RCNN when compared to the tensor-flow one, e.g., accuracy, features/function, speed.

fmassa commented 5 years ago

Hi,

You can find the accuracy / speed / memory usage of maskrcnn-benchmark in https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/MODEL_ZOO.md

Note that not all models that we support are included in the MODEL_ZOO, like RetinaNet, Keypoint R-CNN and models with Group Normalization.

This implementation exactly reproduces the results from Detectron (including adding +1 in the bounding boxes for the width/height plus other details).

YubinXie commented 5 years ago

@fmassa Hi Francisco, thanks for your reply.

The most popular Mask-RCNN before this pyTorch version is the tensor-flow version (https://github.com/matterport/Mask_RCNN). It would be nice to compare between this two version.

Due to lack of specific information, I can't directly compare you two version. The benchmark of the other version is here; https://github.com/matterport/Mask_RCNN/releases

BTW, the tensor flow version provides many examples in Jupiter notebook. It would be nice if you could provide similar (even the same) example to compare. Thank you so much!

fmassa commented 5 years ago

Thanks, I wasn't aware of the benchmarks for matterport implementation of Mask R-CNN, nor their jupyter notebooks, they look very good!

Performance comparison

Matterport

An apples-to-apples accuracy comparison seems not to be easily possible to be done, as they haven't specified which model they are reporting results for. Here are the results they reported:

Evaluate annotation type *bbox*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.544
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.377
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.163
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.390
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.486
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.295
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.433
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.214
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.481
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.601
Evaluate annotation type *segm*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.296
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.510
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.306
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.128
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.330
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.430
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.258
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.369
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.376
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.173
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.417
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.538

maskrcnn-benchmark

Here are the accuracies for Mask R-CNN R-50 C4

Evaluate annotation type *bbox* 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.356
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.560
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.383
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.193
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.398
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.495
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.305
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.486
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.291
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.537
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.652
Evaluate annotation type *segm*
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.315
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.527
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.332
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.131
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.349
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.497
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.278
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.418
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.433
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.231
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.484
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.616

and Mask R-CNN R-50 FPN

Evaluate annotation type *bbox* 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.378
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.592
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.215
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.411
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.499
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.313
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.490
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.514
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.551
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.652
Evaluate annotation type *segm* 
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.342
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.560
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.363
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.156
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.368
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.506
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.293
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.448
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.468
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.272
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.505
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.623

From a quick look, both of our models give slightly better accuracies than matterport's.

More notebooks in maskrcnn-benchmark

Adding notebooks similar to matterport would be a very nice addition, contributions are more than welcome!

YubinXie commented 5 years ago

@fmassa This is very impressive! I am still working on 'load my own data'. It is harder than I did in the other version given the fact that there is no notebook example of loading own data and doing own training... I'd like to provide contributions on the example if I am able to run my data.

fmassa commented 5 years ago

Having a notebook with a step-by-step implementation of training would be great!

Also, check those issues, they might have helpful information: https://github.com/facebookresearch/maskrcnn-benchmark/issues/159 https://github.com/facebookresearch/maskrcnn-benchmark/issues/297 https://github.com/facebookresearch/maskrcnn-benchmark/issues/15

ppwwyyxx commented 5 years ago

The released model in matterport's /releases is a R101-FPN model, with 34.7 box AP / 29.6 mask AP. This repo contains a R101-FPN model (1x shcedule) with 40.1 box AP / 36.1 mask AP.

You can clearly see an accuracy issue. In fact, the lack of good accuracy is their first issue (https://github.com/matterport/Mask_RCNN/issues/1) and have not yet been solved.

fmassa commented 5 years ago

Thanks for the information @ppwwyyxx ! Do you know by chance if maskrcnn-benchmark is also generally faster than matterports implementation during training?

ppwwyyxx commented 5 years ago

I've not seen any mention of speed there, and it also does not use what we called "standard schedule" so it's hard to make comparisons. However, just from the fact that they use Keras, I'm fairly certain to bet that it's less efficient than maskrcnn-benchmark.

YubinXie commented 5 years ago

@fmassa Just to confirm, what is the training and validation data for your benchmark?

fmassa commented 5 years ago

@YubinXie the same as in Detectron: COCO 2017 train (and val), or equivalently COCO 2014 train + valminusminival (and minival)