How to understand test scores?

I would like to understand what is the precision of the model i trained. Here are the results reported after training and evaluation (i guess)

INFO json_dataset_evaluator.py: 162: Writing bbox results json to: /detectron/can_rpn3/test/can_val/generalized_rcnn/bbox_can_val_results.json
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=0.09s).
Accumulating evaluation results...
DONE (t=0.04s).
INFO json_dataset_evaluator.py: 222: ~~~~ Mean and per-category AP @ IoU=[0.50,0.95] ~~~~
INFO json_dataset_evaluator.py: 223: 7.1
INFO json_dataset_evaluator.py: 231: 0.0
INFO json_dataset_evaluator.py: 231: 4.8
INFO json_dataset_evaluator.py: 231: 0.1
INFO json_dataset_evaluator.py: 231: 23.3
INFO json_dataset_evaluator.py: 232: ~~~~ Summary metrics ~~~~
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.071
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.099
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.074
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.071
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.153
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.155
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.155
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.157
INFO json_dataset_evaluator.py: 199: Wrote json eval results to: can_rpn3/test/can_val/generalized_rcnn/detection_results.pkl
INFO task_evaluation.py:  61: Evaluating bounding boxes is done!
INFO task_evaluation.py: 104: Evaluating segmentations
INFO json_dataset_evaluator.py:  83: Writing segmentation results json to: /detectron/can_rpn3/test/can_val/generalized_rcnn/segmentations_can_val_results.json
Loading and preparing results...
DONE (t=0.10s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=0.31s).
Accumulating evaluation results...
DONE (t=0.03s).
INFO json_dataset_evaluator.py: 222: ~~~~ Mean and per-category AP @ IoU=[0.50,0.95] ~~~~
INFO json_dataset_evaluator.py: 223: 6.0
INFO json_dataset_evaluator.py: 231: 0.0
INFO json_dataset_evaluator.py: 231: 3.1
INFO json_dataset_evaluator.py: 231: 0.0
INFO json_dataset_evaluator.py: 231: 20.9
INFO json_dataset_evaluator.py: 232: ~~~~ Summary metrics ~~~~
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.060
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.082
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.061
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.060
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.129
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.131
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.131
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.132
INFO json_dataset_evaluator.py: 122: Wrote json eval results to: can_rpn3/test/can_val/generalized_rcnn/segmentation_results.pkl
INFO task_evaluation.py:  65: Evaluating segmentations is done!
INFO task_evaluation.py: 180: copypaste: Dataset: can_val
INFO task_evaluation.py: 182: copypaste: Task: box
INFO task_evaluation.py: 185: copypaste: AP,AP50,AP75,APs,APm,APl
INFO task_evaluation.py: 186: copypaste: 0.0706,0.0993,0.0735,0.0000,0.0000,0.0707
INFO task_evaluation.py: 182: copypaste: Task: mask
INFO task_evaluation.py: 185: copypaste: AP,AP50,AP75,APs,APm,APl
INFO task_evaluation.py: 186: copypaste: 0.0601,0.0821,0.0612,0.0000,0.0000,0.0601

As i can understand, there are two reported results: bbox and segmentation. Also, there are a score by category that i'm assuming that is by each class, in this particular case, we have:

bbox:

class1 7.1 % of precision?
class2 0.0 % of precision?
class3 4.8 % of precision?
class4 0.1 % of precision?
class5 23.3 % of precision?

segmentation

class1 6.0 % of precision?
class2 0.0 % of precision?
class3 3.1 % of precision?
class4 0.0 % of precision?
class5 20.9 % of precision?

Are this assumption correct?

facebookresearch / Detectron

How to understand test scores? #898