Underestimation of SQ? - Githubissues

When averaging SQ over classes, even those classes are taken into account where TP=0, i.e. which were not even recognized correctly. For those classes SQ=0, so they reduce average SQ considerably. At least for me this breaks the intuition of SQ being a metric that measures how well the segmentation matches ground truth. Consider this example:

class	IOU	TP	FP	FN	PQ	SQ	RQ
class 1	0.88	1	0	1	0.59	0.88	0.67
class 2	0.69	1	0	0	0.69	0.69	1
class 3	0.63	1	0	0	0.63	0.63	1
class 4	0	0	1	1	0	0	0
class 5	0	0	1	0	0	0	0
average	0.38	0.44	0.53

Even though per-class segmentation results were decent for first three classes, when averaged over all classes 0.44 seems unfair. Following the SQ calculation algorithm one could wonder how SQ can ever be smaller than 0.5. Simple solution would be to count only non-zero SQ results when averaging, i.e. in this case the average SQ would be 0.73.

At first I thought that this would break the nice PQ=SQ*RQ formula. But then I realized that averaging breaks it anyway. I understand that this brings up the question how to average PQ as well and it would be nice if the rules would be consistent. That's why I'm posting this as an issue to discuss rather than pull request.

cocodataset / panopticapi

Underestimation of SQ? #14