Sierkinhane / CRNN_Chinese_Characters_Rec

(CRNN) Chinese Characters Recognition.
1.82k stars 538 forks source link

训练时出现 KeyError: '\x00' #10

Closed black107 closed 6 years ago

black107 commented 6 years ago

Start val Traceback (most recent call last): File "crnn_main.py", line 200, in training() File "crnn_main.py", line 117, in training val(crnn, test_dataset, criterion) File "crnn_main.py", line 57, in val t, l = converter.encode(cpu_texts) File "/home/OCR/crnn_train/crnn_chinese_characters_rec-master/utils.py", line 101, in encode index = self.dict[char] KeyError: '\x00' Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f425248dc88>> Traceback (most recent call last): File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in del self._shutdown_workers() File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers self.worker_result_queue.get() File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py", line 337, in get return _ForkingPickler.loads(res) File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd fd = df.detach() File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) ConnectionRefusedError: [Errno 111] Connection refused Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f47434f68d0>> Traceback (most recent call last): File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in del self._shutdown_workers() File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers self.worker_result_queue.get() File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py", line 337, in get return _ForkingPickler.loads(res) File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd fd = df.detach() File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 487, in Client c = SocketClient(address) File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient s.connect(address) ConnectionRefusedError: [Errno 111] Connection refused 主要问题是出现char在dicts中找不到,找到报错的这行数据是“61688125_428659907.jpg 时,塔吉尔的手不由得”,并没有不在字典中的字符,请问这是什么问题造成的呢? 目前的训练集使用的是Synthetic_Chinese_String_Dataset,字典是char_std_5990.txt

Sierkinhane commented 6 years ago

'\x00' 在utf-8编码代表的是空格,这个标签没有空格,可能是你在制作数据集的时候有纰漏,你可以在你的alphabet.py 中的alphabet加个空格,

black107 commented 6 years ago

@Sierkinhane 确实是数据集有点问题,已解决,谢谢

jamesbondzhou commented 5 years ago

你好,我的数据集也是Synthetic_Chinese_String_Dataset,我生成lmdb数据使用的train.txt每一行是图片名+中文标签如:20457281_3395886438.jpg 美丽的传说》——美丽。 我用的py3制作的数据,运行train.py时报错 utils.py, line 45, in for char in text KeyError: '那'