数据集标签 - Githubissues

AstarLight / Lets_OCR

A repository for OCR, which inlcudes some classical OCR algorithms Pytorch implementation such as CTPN, EAST and CRNN.

MIT License

656 stars 327 forks source link

数据集标签 #45

Open scutcr7 opened 5 years ago

scutcr7 commented 5 years ago

数据及的标签是数字啊，数字对应的也不是alphabets文件中的汉字，是不是有别的标签文件呀

lirunhualk commented 5 years ago

确实有这个问题

myhub commented 5 years ago

数据及的标签是数字啊，数字对应的也不是alphabets文件中的汉字，是不是有别的标签文件呀

我也碰到这个问题了，对应的数字跟 alphabet根本对应不起来

myhub commented 5 years ago

数据及的标签是数字啊，数字对应的也不是alphabets文件中的汉字，是不是有别的标签文件呀

我也碰到这个问题了，对应的数字跟 alphabet根本对应不起来

花了快半天，网上找到了字典文件，希望后面的人，没必要再浪费这个时间 https://github.com/xiaomaxiao/keras_ocr

cpkensei commented 5 years ago

@myhub 真的谢谢

Crescentz commented 5 years ago

识别数据集的问题么

luyun760324 commented 4 years ago

@myhub 你是用的char_std_5990.txt这个文件么，train.txt，test.txt没有变化

myhub commented 4 years ago

之前是，后面不用了，现在用的这个 https://github.com/myhub/tr/blob/master/tr/char_table.txt

luyun760324 commented 4 years ago

@myhub 用char_std_5990.txt这个文件，报错size mismatch for rnn.rnn.embedding2.weight: copying a param of torch.Size([5990, 512]) from checkpoint, where the shape is torch.Size([6736, 512]) in current model.也不对呀，train.txt，test.txt的数字与字典码字还是匹配错，你用char_std_5990.txt，怎么实现train.txt，test.txt的标签数字与字典码字匹配上的

myhub commented 4 years ago

只有5990个字符的，这个是只针对那300多万个样本的