Hello, I've come pretty far with all the good documentation and info from this repository, thank you for that! 👌
I have a question regarding the evaluation, specifically the recall of the trained model. The recall that is calculated by the default 'inference_on_dataset()' is lower than I expect it to be.
My trained model has 1 class where I have 14 images for my training dataset and 3 images for my validation dataset, where almost every image has 18 annotated objects in COCO format.
Expected results
I looked at the predicted results on the validation dataset with the following code:
dataset_dicts = DatasetCatalog.get(list(cfg.DATASETS.TEST)[0])
for d in dataset_dicts:
file_name = d["file_name"]
img = cv2.imread(file_name)
predictions = predictor(img)["instances"].to("cpu")
pred_scores = predictions.scores if predictions.has("scores") else None
print(pred_scores)
With the following output (18 times 3 object predictions):
So as I understand recall is calculated with the formula:
recall = True positives / number of ground truths
Which should return 1.0 when all IoU's are greater than threshhold.
I have calculated all IoU's myself and they are as follows:
When I calculated the AR (even when IoU=0.50:0.95), my outcome was 1.00.
So what am I not taking in account?
Is there detailed documentation about the calculations? (had a hard time understanding the source code)
Actual results
[12/13 21:38:57 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')]
[12/13 21:38:57 d2.data.datasets.coco]: Loaded 3 images in COCO format from /content/drive/MyDrive/datasets/lettuce/annotations/lettuce_2020_val.json
[12/13 21:38:57 d2.data.common]: Serializing 3 elements to byte tensors and concatenating them all ...
[12/13 21:38:57 d2.data.common]: Serialized dataset takes 0.08 MiB
[12/13 21:38:57 d2.evaluation.evaluator]: Start inference on 3 images
[12/13 21:39:00 d2.evaluation.evaluator]: Total inference time: 0:00:00.667350 (0.667350 s / img per device, on 1 devices)
[12/13 21:39:00 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:00 (0.184663 s / img per device, on 1 devices)
[12/13 21:39:00 d2.evaluation.coco_evaluation]: Preparing results for COCO format ...
[12/13 21:39:00 d2.evaluation.coco_evaluation]: Saving results to ./output/inference/coco_instances_results.json
[12/13 21:39:00 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type bbox
COCOeval_opt.evaluate() finished in 0.00 seconds.
Accumulating evaluation results...
COCOeval_opt.accumulate() finished in 0.00 seconds.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.869
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.869
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.048
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.498
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.898
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.898
[12/13 21:39:00 d2.evaluation.coco_evaluation]: Evaluation results for bbox:
AP
AP50
AP75
APs
APm
APl
86.917
100.000
100.000
nan
nan
86.917
[12/13 21:39:00 d2.evaluation.coco_evaluation]: Some metrics cannot be computed and is shown as NaN.
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type segm
COCOeval_opt.evaluate() finished in 0.01 seconds.
Accumulating evaluation results...
COCOeval_opt.accumulate() finished in 0.00 seconds.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.887
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.887
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.050
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.502
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.902
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.902
[12/13 21:39:00 d2.evaluation.coco_evaluation]: Evaluation results for segm:
Hello, I've come pretty far with all the good documentation and info from this repository, thank you for that! 👌 I have a question regarding the evaluation, specifically the recall of the trained model. The recall that is calculated by the default 'inference_on_dataset()' is lower than I expect it to be.
My trained model has 1 class where I have 14 images for my training dataset and 3 images for my validation dataset, where almost every image has 18 annotated objects in COCO format.
Expected results
I looked at the predicted results on the validation dataset with the following code:
With the following output (18 times 3 object predictions):
So as I understand recall is calculated with the formula: recall = True positives / number of ground truths Which should return 1.0 when all IoU's are greater than threshhold.
I have calculated all IoU's myself and they are as follows:
When I calculated the AR (even when IoU=0.50:0.95), my outcome was 1.00.
Actual results
[12/13 21:38:57 d2.data.dataset_mapper]: [DatasetMapper] Augmentations used in inference: [ResizeShortestEdge(short_edge_length=(800, 800), max_size=1333, sample_style='choice')] [12/13 21:38:57 d2.data.datasets.coco]: Loaded 3 images in COCO format from /content/drive/MyDrive/datasets/lettuce/annotations/lettuce_2020_val.json [12/13 21:38:57 d2.data.common]: Serializing 3 elements to byte tensors and concatenating them all ... [12/13 21:38:57 d2.data.common]: Serialized dataset takes 0.08 MiB [12/13 21:38:57 d2.evaluation.evaluator]: Start inference on 3 images [12/13 21:39:00 d2.evaluation.evaluator]: Total inference time: 0:00:00.667350 (0.667350 s / img per device, on 1 devices) [12/13 21:39:00 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:00 (0.184663 s / img per device, on 1 devices) [12/13 21:39:00 d2.evaluation.coco_evaluation]: Preparing results for COCO format ... [12/13 21:39:00 d2.evaluation.coco_evaluation]: Saving results to ./output/inference/coco_instances_results.json [12/13 21:39:00 d2.evaluation.coco_evaluation]: Evaluating predictions with unofficial COCO API... Loading and preparing results... DONE (t=0.00s) creating index... index created! Running per image evaluation... Evaluate annotation type bbox COCOeval_opt.evaluate() finished in 0.00 seconds. Accumulating evaluation results... COCOeval_opt.accumulate() finished in 0.00 seconds.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.869 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.869 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.048 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.498 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.898 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.898
[12/13 21:39:00 d2.evaluation.coco_evaluation]: Some metrics cannot be computed and is shown as NaN. Loading and preparing results... DONE (t=0.00s) creating index... index created! Running per image evaluation... Evaluate annotation type segm COCOeval_opt.evaluate() finished in 0.01 seconds. Accumulating evaluation results... COCOeval_opt.accumulate() finished in 0.00 seconds.
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.887 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 1.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.887 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.050 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.502 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.902 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.902
Detailed steps to reproduce
System information
Google Colab Notebook