Open bertsky opened 2 years ago
Yes, that paper lent the idea for the oversegmentation
and undersegmentation
measures – but only these two (not the others), and I took the liberty to deviate from the exact definition of Zhang et al. 2021:
https://github.com/OCR-D/ocrd_segment/blob/81923495648c346a84436fb7d08727d9c13eb88d/ocrd_segment/evaluate.py#L440-L444
So in my implementation these measures are merely raw ratios, i.e. the share of regions in GT and DT which have been oversegmented (or undersegmented, resp.).
My notion of a match is somewhat arbitrary, but IMO more adequate than averaging over different IoU thresholds for various confidence thresholds:
(All area values under consideration are numbers of pixels in the polygon-masked segments, not just bounding box sizes.)
So in all, you get the following metrics here:
IoU
: intersection over union,
i.e. the share of the overlapping area of a match over the union of the true and the predicted regionIoGT
: intersection over ground truth,
i.e. the share of the overlapping area of a match over the total area of the true regionIoDT
: intersection over detection,
i.e. the share of the overlapping area of a match over the total area of the predicted regionpixel-recall
: page-wise aggregate of intersection over GT including missed true regions (FN),
i.e. the share of the overlapping areas over the total area of true regions in a pagepixel-precision
: page-wise aggregate of intersection over DT including fake predicted regions (FP),
i.e. the share of the overlapping areas over the total area of predicted regions in a pageoversegmentation
: share of true and predicted regions which have been oversegmented (i.e. where true regions match multiple detections) over all regionsundersegmentation
: share of true and predicted regions which have been undersegmented (i.e. where predicted regions match multiple ground truths) over all regionsrecall
: ratio of matches (TP) over true regions,
i.e. share of correctly predicted regions in total GT precision
: ratio of matches (TP) over detected regions,
i.e. share of correctly predicted regions in total DTFor each metric, there is a page-wise (or even segment-wise) and an aggregated measure; the latter always uses micro-averaging over all (matching pairs in all) pages.
Originally posted by @andreaceruti in https://github.com/cocodataset/cocoapi/issues/564#issuecomment-1064223428