Output character-level localization

Belval / TextRecognitionDataGenerator

A synthetic data generator for text recognition

MIT License

3.24k stars 968 forks source link

Output character-level localization #107

Open Belval opened 4 years ago

Belval commented 4 years ago

When generating images, it would be interesting to output the bounding box/mask of each character, to train localization models.

There are two possibles implementations:

Output bounding boxes in a JSON file
Output a mask file with a specific pixel value for each character

MVP would be for it to work with non-skewed/blurred images, but ideally, it should work for any configuration.

Ownmarc commented 4 years ago

Yep, would be interesting, need help on that ?

Belval commented 4 years ago

I already implemented the first option of outputting a mask with a different pixel value for each characters. It's not ready to merge yet, because of tests and handling the handwritten generators.

I will commit what I have so you can take a look, I also want to implement the format used in this paper if you would like to try it: https://arxiv.org/pdf/1904.01941.pdf

Here is an example of the current output:

outwriggled three-in-hand long-standing Arvida deaccessioning_0

It is rather hard to see, but each character's color is treated as a "label" with the RGB value incremented for each character drawn.

Here you'd have (0, 0, 1), (0, 0, 2), ... (0, 0, 255), (0, 1, 0) etc...

To create bounding boxes around each character I think skimage could be used: https://muthu.co/draw-bounding-box-around-contours-skimage/

Belval commented 4 years ago

Branch is https://github.com/Belval/TextRecognitionDataGenerator/tree/output-mask

Belval commented 4 years ago

Branch is now merged in master, masks can be generated by using -om 1

acculturation_1 acculturation_1_mask

The mask is not really human friendly, it works by setting pixels in increments of one.

Here are the colors for each letter in this case:

a => (0, 0, 1) c => (0, 0, 2) c => (0, 0, 3) u => (0, 0, 4) ...

Giving us a maximum character count of 256³ - 1 or 16777215.

Two things need to be added before the issue can be closed:

Proper documentation for the feature
A sample script to convert that format to a more "normal" binary numpy array.

KhanhCon commented 4 years ago

Hi Edouard, Does it now support bounding boxes around characters?

Belval commented 4 years ago

Right now it does not, is there a standard format you would like me to implement?

yyyash8 commented 4 years ago

Right now it does not, is there a standard format you would like me to implement?

Is it possible when the code is writing text on the background image it has to have coordinates, by using those coordinates can we get that bounding box values and save it to txt like yolo format uses. e.g. for test.jpg(containing 2 char)>>test.txt(containing 2 bounding boxex) 12 33 444 555 34 56 67 88

Belval commented 4 years ago

I can do that for sure, I'll tag you when the feature is ready.

yyyash8 commented 4 years ago

I can do that for sure, I'll tag you when the feature is ready.

Thanks

jacksonthall22 commented 2 months ago

Hey @Belval, any progress on this feature? I'm trying to generate a synthetic handwriting dataset for character localization. This would be a huge help!