mAP calculation for object_detection

Hi,

I've noticed that for the mAP calculation of object detection models, the "gt" and "preds" dictionaries generated are 1 per batch instead of 1 per image (train.py ln 1202 - 1218). From official documentation (https://lightning.ai/docs/torchmetrics/stable/detection/mean_average_precision.html): "preds (List): A list consisting of dictionaries each containing the key-values (each dictionary corresponds to a single image)" "target (List): A list consisting of dictionaries each containing the key-values (each dictionary corresponds to a single image)"

Also, a filter is first applied to the models output to remove image predictions with only 'background' (train.py ln 1192). This means that if in a given batch, no boxes are detected -> model only outputs background -> all boxes are filtered leaving none -> map_calculator.update is never called for said batch resulting in no mAP decrease.

analogdevicesinc / ai8x-training

mAP calculation for object_detection #336