Closed Gyann-z closed 4 years ago
There are indeed a small proportion of wrong labels. They can result from two causes:
(1) The scene models with which the data are generated have unproportionate sizes or badly-crafted object meshes. As the result, the text regions may be too small and sometimes the rendering process will fail. Such failures are characterized by extremely small text regions. So, we can just ignore them by taking them off from loss calculation. In fact, I filtered out text regions by a size threshold (short size <10 pixels) in all my experiments. The 10-pixel threshold is also used in some paper even when they train with real world images.
(2) Negative coordinates mean they are out of the screen. They usually do not appear, but in the case of a bad scene model, it is possible. The advice here is that, you can just ignore them or exclude them from loss calculation.
Overall, the proportion of problematic annotations should be ignorable; otherwise, training models on them will result in disastrous performances :)
Thanks :)
Hi, I have dealt with it according to your method, but I still find some problems. For example, there are many pure black images in sub_23, and some text still does not appear in the image after filtering with the threshold value of 10.
That's for the same reason. When I trained my own models, I just ignored these errors. There are only few of them.
Hi, When I used the dataset (English/Latin) you provided, I found that some json annotations were wrong, such as 749 ~ 753 for sub_48. And why are some coordinates negative?
Looking forward to your reply.