ankush-me / SynthText

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
http://www.robots.ox.ac.uk/~vgg/data/scenetext/
Apache License 2.0
2k stars 621 forks source link

The height of bounding boxes for some words is much larger than the words #25

Open cjnolet opened 7 years ago

cjnolet commented 7 years ago

This is causing some problems when determining the midpoints for the bounding boxes for the fully convolutional network. It's basically putting the midpoint in a different spot then it would have been had the bounding box tightly enclosed the text. Any ideas?

ankush-me commented 7 years ago

This happens for some fonts when they are "underlined" -- this increases the glyph height as the "underline" position is set at the lowest point, e.g., below the end of p. This problem should go away if you don't use the underlined glyphs.

cjnolet commented 7 years ago

I am not using any underlining, at least as far I can tell. I set the underline probability to be 0.0.

On Thu, Mar 2, 2017 at 5:35 PM, Ankush Gupta notifications@github.com wrote:

This happens for some fonts when they are "underlined" -- this increases the glyph height as the "underline" position is set at the lowest point, e.g., below the end of p. This problem should go away if you don't use the underlined glyphs.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ankush-me/SynthText/issues/25#issuecomment-283805059, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL1YLCQIC7830JN3gpaYmNO1d4kDt9Zks5rh0RDgaJpZM4MRnEw .

ankush-me commented 7 years ago

Not sure then. Do you know if this is a problem with specific fonts?

cjnolet commented 7 years ago

I'm checking into it now.

Another thing I'm trying to do is use your FCRN concept to create a text detection framework (whereby I basically only use the "c" portion of your FCRN but determine if the cell contains text at all rather than the midpoint of a bounding box. I'm currently looking through the SyntheText python code to see if there's a good way I can get hold of the actual text pixels in a separate surface/img (to use as a mask) for labelling my "detection FCRN". Off hand, I'm seeing where the text is rendered to what looks like a blank pygame surface. Hopefully I can use this to extract the mask and label my 32x32 grid for my training set.

On Thu, Mar 2, 2017 at 6:06 PM, Ankush Gupta notifications@github.com wrote:

Not sure then. Do you know if this is a problem with specific fonts?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ankush-me/SynthText/issues/25#issuecomment-283811845, or mute the thread https://github.com/notifications/unsubscribe-auth/ABL1YNCxiBpP7n4VnJiLNyjPclb8gm_Aks5rh0tzgaJpZM4MRnEw .

ankush-me commented 7 years ago

You can get these masks here (place_masks), and here (collision_mask).

cjnolet commented 7 years ago

Ah so this is actually a binary mask of the text drawn on the image? That's exactlyw hat I need is basically like a layer with only text (after it's been warped) so that I can determine which cells (32x32 grid, just like your FCRN implementation) actually contain text.

cjnolet commented 7 years ago

I got it! Looking @ the feathered mask. Perfect, Ankush. Thanks again!

codecolony commented 6 years ago

@cjnolet @ankush-me

Facing the same bounding box height issues for few of the text boxes. Pretty mysterious. I'm trying to debug the whole thing to zero in on the issue. Meanwhile, the cause is already known to anyone?

crazysal commented 5 years ago

@cjnolet I have a similar use case, please advise :

  1. elaborate on how to extract binary masks, can I generate them per character or per word.
  2. How to find absolute center point per bbox
  3. How to calculate width per char, can it be equal to width of char level bbox ?