Closed black107 closed 6 years ago
'\x00' 在utf-8编码代表的是空格,这个标签没有空格,可能是你在制作数据集的时候有纰漏,你可以在你的alphabet.py 中的alphabet加个空格,
@Sierkinhane 确实是数据集有点问题,已解决,谢谢
你好,我的数据集也是Synthetic_Chinese_String_Dataset,我生成lmdb数据使用的train.txt每一行是图片名+中文标签如:20457281_3395886438.jpg 美丽的传说》——美丽。
我用的py3制作的数据,运行train.py时报错
utils.py, line 45, in
Start val Traceback (most recent call last): File "crnn_main.py", line 200, in
training()
File "crnn_main.py", line 117, in training
val(crnn, test_dataset, criterion)
File "crnn_main.py", line 57, in val
t, l = converter.encode(cpu_texts)
File "/home/OCR/crnn_train/crnn_chinese_characters_rec-master/utils.py", line 101, in encode
index = self.dict[char]
KeyError: '\x00'
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f425248dc88>>
Traceback (most recent call last):
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in del
self._shutdown_workers()
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
Exception ignored in: <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f47434f68d0>>
Traceback (most recent call last):
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 349, in del
self._shutdown_workers()
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 328, in _shutdown_workers
self.worker_result_queue.get()
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/queues.py", line 337, in get
return _ForkingPickler.loads(res)
File "/home/anaconda3/envs/pytorch/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 70, in rebuild_storage_fd
fd = df.detach()
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 487, in Client
c = SocketClient(address)
File "/home/anaconda3/envs/pytorch/lib/python3.6/multiprocessing/connection.py", line 614, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 111] Connection refused
主要问题是出现char在dicts中找不到,找到报错的这行数据是“61688125_428659907.jpg 时,塔吉尔的手不由得”,并没有不在字典中的字符,请问这是什么问题造成的呢?
目前的训练集使用的是Synthetic_Chinese_String_Dataset,字典是char_std_5990.txt