Belval / CRNN

A TensorFlow implementation of https://github.com/bgshih/crnn
MIT License
297 stars 101 forks source link

pretrained model #31

Open tjpulfn opened 5 years ago

tjpulfn commented 5 years ago

hello ,i will trouble you again. the pretrained model can be tested using chinese? when i test in chinese it has the error File "/Users/liufengnan/workspace/OCR/CRNN/CRNN/utils.py", line 48, in <listcomp> return [config.CHAR_VECTOR.index(x) for x in label] ValueError: substring not found and then i change the CHAR_VECTOR in config.py use chinese characters. have error with shape InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [512,3992] rhs shape= [512,70] [[Node: save/Assign = Assign[T=DT_FLOAT, _class=["loc:@W"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](W, save/RestoreV2)]]

and can yue understand my english, it is poor for me

Belval commented 5 years ago

The pretrained model uses English letters I'm afraid. If you wish to use it with Chinese you will have to retrain it.

tjpulfn commented 5 years ago

yes, i train the model with Chinese, but the loss higher, for exampe: [21] Iteration loss: 388.9223213195801 [22] Iteration loss: 386.60620498657227 [23] Iteration loss: 384.27929306030273 [24] Iteration loss: 382.09375 [25] Iteration loss: 380.0574035644531 [26] Iteration loss: 378.1801071166992 [27] Iteration loss: 376.39180755615234 [28] Iteration loss: 374.7563133239746

and the speed is very very very slowly, is this normal?

Belval commented 5 years ago

Yes, the network in itself is quite long to train I'm afraid.

Also, the data feeding system I used (custom batches + feed_dict) in this is terrible so the training is slow.

tjpulfn commented 5 years ago

hello, when the model trained with Chinese, firstly, https://github.com/Belval/CRNN/blob/51b2ebe6c8d7a0dec6df1339bf404507301229a3/CRNN/crnn.py#L195 result is : 天 败 袁 唐 铛 董 撰按漱按氯按氯按网按网残 蒿 爵 樟 怕 鸿 狙 按氯按氯按氯按氯按网按网 柯 岸 邱 亚 可 块 悯哥喻哥 哥 哥 贝 纳 鲤 讯 濯 捕 纣悯哥沃哥 沃 沃 铃 丫 征 贿 琴 使 齐 齐 齐 齐 昙 常 美 养 明 圆 齐 齐 齐

and then, the result is : 锯 祷 沫 官 声 慢

玲 交 讽 砷 菇 蜡

质 唁 伪 咏 袋 紫

砸 头 倦 哨 躬 液

泪 轿 沼 厩 浑 充

士 救 晏 莎 辅 宋

矢 拖 流 慷 稗 桥

阿 逮 凤 杀 翊 款

腰 有 叨 更 丐 蜈

悼 闺 咋 询 嘁 咋

芯 侣 玉 奏 钠 伶

纬 构 邮 谅 指 竟

箴 廉 妆 坚 叔 隔

and the 'decoded' is null, so why is it, what's wrong with me?

Belval commented 5 years ago

Hi,

Make sure that you edited the CHAR_VECTOR string before training.

Regards

StromWine commented 5 years ago

@tjpulfn hi! I have a similar problem to yours when i use this model to train with chinese.Can you tell me how to solve this question finally? Thank you!

kienchen commented 5 years ago

use the pre-train under windows 10 python run.py -ex ..\samples --test --restore empty result, 1-10 is the files when conduct data_loading, but no result after testing predict.

Loading data 1 2 3 4 5 6 7 8 9 10 Testing