Open Belval opened 4 years ago
Yep, would be interesting, need help on that ?
I already implemented the first option of outputting a mask with a different pixel value for each characters. It's not ready to merge yet, because of tests and handling the handwritten generators.
I will commit what I have so you can take a look, I also want to implement the format used in this paper if you would like to try it: https://arxiv.org/pdf/1904.01941.pdf
Here is an example of the current output:
It is rather hard to see, but each character's color is treated as a "label" with the RGB value incremented for each character drawn.
Here you'd have (0, 0, 1), (0, 0, 2), ... (0, 0, 255), (0, 1, 0) etc...
To create bounding boxes around each character I think skimage
could be used: https://muthu.co/draw-bounding-box-around-contours-skimage/
Branch is now merged in master, masks can be generated by using -om 1
The mask is not really human friendly, it works by setting pixels in increments of one.
Here are the colors for each letter in this case:
a => (0, 0, 1) c => (0, 0, 2) c => (0, 0, 3) u => (0, 0, 4) ...
Giving us a maximum character count of 256³ - 1 or 16777215.
Two things need to be added before the issue can be closed:
Hi Edouard, Does it now support bounding boxes around characters?
Right now it does not, is there a standard format you would like me to implement?
Right now it does not, is there a standard format you would like me to implement?
Is it possible when the code is writing text on the background image it has to have coordinates, by using those coordinates can we get that bounding box values and save it to txt like yolo format uses. e.g. for test.jpg(containing 2 char)>>test.txt(containing 2 bounding boxex) 12 33 444 555 34 56 67 88
I can do that for sure, I'll tag you when the feature is ready.
I can do that for sure, I'll tag you when the feature is ready.
Thanks
Hey @Belval, any progress on this feature? I'm trying to generate a synthetic handwriting dataset for character localization. This would be a huge help!
When generating images, it would be interesting to output the bounding box/mask of each character, to train localization models.
There are two possibles implementations:
MVP would be for it to work with non-skewed/blurred images, but ideally, it should work for any configuration.