Closed Jayhello closed 7 years ago
Yes, it can be used to generate non-ASCII characters like Chinese, but you will need to do some adaptation.
You would need to (at least) make the following changes:
Be careful of Chinese fonts, some characters in your vocabulary may be not covered. Some font contain more than 10k characters, while others contain ~4k common Chinese characters.
@crazylyf I want to use this "SynthText" to generate images with natural chinese words. then to do chinese words recognition in the image. my question is :
@Jayhello
@crazylyf thank you very much for you reply !! I have read the paper your recommend and my question is
I know i should localities the char sequence firstly and then to recognition. For location chars this https://github.com/MhLiao/TextBoxes is useful
@Jayhello Sorry, I have exact answer on how many characters for each words to prepare. Usually, one generate samples from given corpus, which contains quite diverse character frequency, and common character like "你" has much larger occurrence. The text on your example seems added afterwards via some photo editing tool, it may be different from the synthesized text here, which suppose that the text is located on well defined regions. Perhaps you should try release or loosen this constraint to suit your case.
the origin image is below, the mark in the image is located by deep learn[https://github.com/MhLiao/TextBoxes]( )
and are you a chinese people?
Yeah
@crazylyf 我生成图片拿去识别训练的话,应该用怎样的图片呢? 如果用下面的第一张,那就像是OCR了,没有什么意义?下面的第二张这样? 那得多少张图片呢? 每个字要 1K 张图片? 一张图片包含很多字?
@Jayhello 除非做文档识别,不然肯定采用第二种图片。 具体每个字需要多少个样本,没有相关数据,我也没有做过相关实验。个人认为大致在几十个,具体还看应用场景。
@Jayhello 我有一个163邮箱,用户名是crazylyf。有兴趣私聊吧
@crazylyf 非常感谢哈,我也有代码往图片上面打上文字 以及 坐标。 你没有www.crazylyf@163.com 吗?
@crazylyf SynthText 应该也可以生成这样的图片吧
@crazylyf https://github.com/MhLiao/TextBoxes 你是重新训练了吗
@xiaomaxiao TextBoxes?没有
@crazylyf 直接可以用于汉字检测了?
不好意思,没看原文,不过我想应该可以吧。
@crazylyf 哇那真是不错,CTPN也可以直接检测汉字。TextbOX 再CPU下的速度如何?
没试过TextBoxes,不太清楚。
抱歉@错人了, @Jayhello
how can i use this to generate cineses word image