When averaging SQ over classes, even those classes are taken into account where TP=0, i.e. which were not even recognized correctly. For those classes SQ=0, so they reduce average SQ considerably. At least for me this breaks the intuition of SQ being a metric that measures how well the segmentation matches ground truth. Consider this example:
class
IOU
TP
FP
FN
PQ
SQ
RQ
class 1
0.88
1
0
1
0.59
0.88
0.67
class 2
0.69
1
0
0
0.69
0.69
1
class 3
0.63
1
0
0
0.63
0.63
1
class 4
0
0
1
1
0
0
0
class 5
0
0
1
0
0
0
0
average
0.38
0.44
0.53
Even though per-class segmentation results were decent for first three classes, when averaged over all classes 0.44 seems unfair. Following the SQ calculation algorithm one could wonder how SQ can ever be smaller than 0.5. Simple solution would be to count only non-zero SQ results when averaging, i.e. in this case the average SQ would be 0.73.
At first I thought that this would break the nice PQ=SQ*RQ formula. But then I realized that averaging breaks it anyway. I understand that this brings up the question how to average PQ as well and it would be nice if the rules would be consistent. That's why I'm posting this as an issue to discuss rather than pull request.
When averaging SQ over classes, even those classes are taken into account where TP=0, i.e. which were not even recognized correctly. For those classes SQ=0, so they reduce average SQ considerably. At least for me this breaks the intuition of SQ being a metric that measures how well the segmentation matches ground truth. Consider this example:
Even though per-class segmentation results were decent for first three classes, when averaged over all classes 0.44 seems unfair. Following the SQ calculation algorithm one could wonder how SQ can ever be smaller than 0.5. Simple solution would be to count only non-zero SQ results when averaging, i.e. in this case the average SQ would be 0.73.
At first I thought that this would break the nice PQ=SQ*RQ formula. But then I realized that averaging breaks it anyway. I understand that this brings up the question how to average PQ as well and it would be nice if the rules would be consistent. That's why I'm posting this as an issue to discuss rather than pull request.