Closed XYudong closed 5 years ago
Hi,
The evaluation code follows the same as ICDAR evaluation metric. You can also refer to https://github.com/Yuliang-Liu/TIoU-metric/tree/master/curved-tiou which produces exact the same result.
It would be appreciated if you can explain why in this paper calculating result for each image would be better. We can discuss with it.
As for MLT, they do calculate through all the test images (so, a mistake in their MLT 2017 paper) like I did. Here is part of the text from MLT official email:
"The recall, precision and f-measure are NOT calculated for each image individually. They are computed based on the detected boxes in all the images (of course the boxes are matched/processed image by image). There was a confusion because in the paper of MLT-2017, there was a mistake in describing the evaluation protocol (in the paper, it is mentioned that the f-measure is computed per image and then averaged across the images -- this is not what we did)."
Thanks
Wow! Thank you so much. Incredible! I found the correction at the bottom of their website.
Btw, about the two methods, the only advantage I came up with for the method, in your code, accumulating through all test images(let's say method1) is that it seems less sensitive to outliers.
E.g. a = np.array([4, 6, 1, 2, 8, 18]) b = np.array([5, 8, 6, 10, 11, 22])
If we see 1, 2 in a
as outliers/bad samples
Method1: 0.629, Method2: 0.577;
Is this the reason we don't use the method2? I still feel that the unit of testing a model is a single image, so we should calculate metrics based on each image. I don't know, maybe there are some statistics tricks?
Thank you again
Yes, outliers is one possible reason. See below figure,
For my concern, I would say method 1 is better. Currently, I have not found any dataset based on method 2 in the literature. This is only my personal idea; and if you are obsessed with it, I am sure there should be more theoretical explains from previous works.
Best regards
Yeap. Pretty cool. Thank you so much for your reply. It's a joyful discussion.
You are welcome. Thanks for your attention.
Hi,
Thank you so much for your repository!
In
voc_eval_polygon.py
, it seems that you are accumulatingtp, fp
through all the test images? And then calculating prec, rec at the end. Let me know if I misunderstand it.But I think, at least in the paper, we should calculate prec and recall for each image, and then taking the average of these precision and recall.
Thanks