I've noticed that for the mAP calculation of object detection models, the "gt" and "preds" dictionaries generated are 1 per batch instead of 1 per image (train.py ln 1202 - 1218).
From official documentation (https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html):
"preds (List): A list consisting of dictionaries each containing the key-values (each dictionary corresponds to a single image)"
"target (List): A list consisting of dictionaries each containing the key-values (each dictionary corresponds to a single image)"
Also, a filter is first applied to the models output to remove image predictions with only 'background' (train.py ln 1192). This means that if in a given batch, no boxes are detected -> model only outputs background -> all boxes are filtered leaving none -> map_calculator.update is never called for said batch resulting in no mAP decrease.
Hi,
I've noticed that for the mAP calculation of object detection models, the "gt" and "preds" dictionaries generated are 1 per batch instead of 1 per image (train.py ln 1202 - 1218). From official documentation (https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html): "preds (List): A list consisting of dictionaries each containing the key-values (each dictionary corresponds to a single image)" "target (List): A list consisting of dictionaries each containing the key-values (each dictionary corresponds to a single image)"
Also, a filter is first applied to the models output to remove image predictions with only 'background' (train.py ln 1192). This means that if in a given batch, no boxes are detected -> model only outputs background -> all boxes are filtered leaving none -> map_calculator.update is never called for said batch resulting in no mAP decrease.