facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.22k stars 5.45k forks source link

How to understand test scores? #898

Open ambigus9 opened 5 years ago

ambigus9 commented 5 years ago

I would like to understand what is the precision of the model i trained. Here are the results reported after training and evaluation (i guess)

INFO json_dataset_evaluator.py: 162: Writing bbox results json to: /detectron/can_rpn3/test/can_val/generalized_rcnn/bbox_can_val_results.json
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.09s).
Accumulating evaluation results...
DONE (t=0.04s).
INFO json_dataset_evaluator.py: 222: ~~~~ Mean and per-category AP @ IoU=[0.50,0.95] ~~~~
INFO json_dataset_evaluator.py: 223: 7.1
INFO json_dataset_evaluator.py: 231: 0.0
INFO json_dataset_evaluator.py: 231: 4.8
INFO json_dataset_evaluator.py: 231: 0.1
INFO json_dataset_evaluator.py: 231: 23.3
INFO json_dataset_evaluator.py: 232: ~~~~ Summary metrics ~~~~
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.071
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.099
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.074
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.071
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.153
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.155
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.155
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.157
INFO json_dataset_evaluator.py: 199: Wrote json eval results to: can_rpn3/test/can_val/generalized_rcnn/detection_results.pkl
INFO task_evaluation.py:  61: Evaluating bounding boxes is done!
INFO task_evaluation.py: 104: Evaluating segmentations
INFO json_dataset_evaluator.py:  83: Writing segmentation results json to: /detectron/can_rpn3/test/can_val/generalized_rcnn/segmentations_can_val_results.json
Loading and preparing results...
DONE (t=0.10s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=0.31s).
Accumulating evaluation results...
DONE (t=0.03s).
INFO json_dataset_evaluator.py: 222: ~~~~ Mean and per-category AP @ IoU=[0.50,0.95] ~~~~
INFO json_dataset_evaluator.py: 223: 6.0
INFO json_dataset_evaluator.py: 231: 0.0
INFO json_dataset_evaluator.py: 231: 3.1
INFO json_dataset_evaluator.py: 231: 0.0
INFO json_dataset_evaluator.py: 231: 20.9
INFO json_dataset_evaluator.py: 232: ~~~~ Summary metrics ~~~~
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.060
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.082
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.061
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.060
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.129
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.131
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.131
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.132
INFO json_dataset_evaluator.py: 122: Wrote json eval results to: can_rpn3/test/can_val/generalized_rcnn/segmentation_results.pkl
INFO task_evaluation.py:  65: Evaluating segmentations is done!
INFO task_evaluation.py: 180: copypaste: Dataset: can_val
INFO task_evaluation.py: 182: copypaste: Task: box
INFO task_evaluation.py: 185: copypaste: AP,AP50,AP75,APs,APm,APl
INFO task_evaluation.py: 186: copypaste: 0.0706,0.0993,0.0735,0.0000,0.0000,0.0707
INFO task_evaluation.py: 182: copypaste: Task: mask
INFO task_evaluation.py: 185: copypaste: AP,AP50,AP75,APs,APm,APl
INFO task_evaluation.py: 186: copypaste: 0.0601,0.0821,0.0612,0.0000,0.0000,0.0601

As i can understand, there are two reported results: bbox and segmentation. Also, there are a score by category that i'm assuming that is by each class, in this particular case, we have:

bbox:

segmentation

Are this assumption correct?