关于训练集的疑问

JinpengLI / deep_ocr

make a better chinese character recognition OCR than tesseract

1.52k stars 486 forks source link

关于训练集的疑问 #23

Open tangsipeng opened 7 years ago

tangsipeng commented 7 years ago

首先，感谢你的开源项目~ 我的问题是：我看到之前的问题中有提到训练集的问题，也下载了百度网盘的数据。知道可以用字体文件生成训练集。请问训练集难道每个类别只有一张图片吗？如果不是的话更多的训练数据是如何自动产生的？

JinpengLI commented 7 years ago

是的，貌似我没有写倾斜之类的，你可以试试自己加旋转。。这样可以增加训练数据

tangsipeng commented 7 years ago

@JinpengLI 我有个疑惑，一个类别一张图片能训练出来吗？你的意思是只要循环多个批次就好？

JinpengLI commented 7 years ago

不是，一个字有很多个字体文件，所以一个字有很多图片。

tangsipeng commented 7 years ago

好的，明白了。还想确认一下，整个识别算法是基于 Deep Convolutional Network for Handwritten Chinese Character Recognition 这篇文章的吗？

terminats17 commented 7 years ago

你好，请问下，我用了你的字体文件，发现一个字只有不到10个字体。这样能训练出来吗？

JinpengLI commented 7 years ago

有旋转参数？

On Mon, Oct 9, 2017 at 6:58 PM, terminats17 notifications@github.com wrote:

你好，请问下，我用了你的字体文件，发现一个字只有不到10个字体。这样能训练出来吗？

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JinpengLI/deep_ocr/issues/23#issuecomment-335217881, or mute the thread https://github.com/notifications/unsubscribe-auth/ADLUDW8DIgTT-uYWcjIJ-lxuafwz0rSsks5sqlCvgaJpZM4OQj3T .

lushilun commented 6 years ago

您好，我将文件下载下来后，但是训练集在哪里呢？那个train.txt 和test.txt一直找不到呢