启动训练时， keyerror 无法找到字典中的映射。

terrancraft commented 3 years ago

您好，我使用您的crnn.pytorch仓库中的代码进行训练，却发生了keyerror的问题。训练日志如下： error_log

数据集是我自己准备的，数据标签文件格式如下：

img/word_20347.png 中國國際航空公司 img/word_20351.png 头等舱 img/word_20363.png 北京市科普教育基地

按照您的readme，我使用gen_key.py通过上面的数据标签生了字典文件： 迫迹追送适选透递途通速造逸逻

使用的yaml文件如下(其实用的是您仓库里的imagedataset_None_VGG_RNN_CTC.yaml文件)： {'name': 'crnn', 'base': ['config/image_dataset.yaml'], 'arch': {'type': 'Model', 'trans': {'type': 'None', 'input_size': [32, 320], 'num_fiducial': 20}, 'backbone': {'type': 'VGG', 'conv_type': 'BasicConv'}, 'neck': {'type': 'RNNDecoder', 'hidden_size': 256}, 'head': {'type': 'CTC'}}, 'loss': {'type': 'CTCLoss', 'blank': 0}, 'optimizer': {'type': 'Adam', 'args': {'lr': 0.001}}, 'lr_scheduler': {'type': 'StepLR', 'args': {'step_size': 30, 'gamma': 0.1}}, 'trainer': {'seed': 2, 'gpus': [0], 'epochs': 10, 'log_iter': 10, 'resume_checkpoint': '', 'finetune_checkpoint': '', 'output_dir': 'output', 'tensorboard': True}, 'dataset': {'alphabet': 'digit.txt', 'train': {'dataset': {'type': 'ImageDataset', 'args': {'data_path': [['path/train.txt']], 'data_ratio': [1.0], 'pre_processes': [{'type': 'Resize', 'args': {'img_h': 32, 'img_w': 120, 'pad': True, 'random_crop': False}}], 'transforms': [{'type': 'ToTensor', 'args': {}}], 'img_mode': 'RGB', 'ignore_chinese_punctuation': True, 'remove_blank': True}}, 'loader': {'batch_size': 16, 'shuffle': True, 'pin_memory': False, 'num_workers': 6}}, 'validate': {'dataset': {'type': 'ImageDataset', 'args': {'data_path': ['path/val.txt'], 'pre_processes': [{'type': 'Resize', 'args': {'img_h': 32, 'img_w': 120, 'pad': True, 'random_crop': False}}], 'transforms': [{'type': 'ToTensor', 'args': {}}], 'img_mode': 'RGB', 'ignore_chinese_punctuation': True, 'remove_blank': True}}, 'loader': {'batch_size': 4, 'shuffle': True, 'pin_memory': False, 'num_workers': 6}}}}

此外，我用的是国外的操作系统，文字编码问题会不会成为报错原因？希望能得到您的答复，谢谢！

terrancraft commented 3 years ago

我自己找到原因了，配置文件中是digit.txt。但是gen_key.py生成的文件名是dict.txt。文件名不一致，所以找不到字典中的映射。

dengfenglai321 commented 3 years ago

我自己找到原因了，配置文件中是digit.txt。但是gen_key.py生成的文件名是dict.txt。文件名不一致，所以找不到字典中的映射。

请问你这个问题是那个项目？怎么生成key，我也遇到了一样的问题

WenmuZhou / PytorchOCR

启动训练时， keyerror 无法找到字典中的映射。 #112