AstarLight / Lets_OCR

A repository for OCR, which inlcudes some classical OCR algorithms Pytorch implementation such as CTPN, EAST and CRNN.
MIT License
656 stars 327 forks source link

AssertionError: index range error #56

Closed wjx-git closed 5 years ago

wjx-git commented 5 years ago

Traceback (most recent call last): File "/home/ayg/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/ayg/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in samples = collate_fn([dataset[i] for i in batch_indices]) File "/home/ayg/workspace/crnn/lib/dataset.py", line 59, in getitem return self[index + 1] File "/home/ayg/workspace/crnn/lib/dataset.py", line 59, in getitem return self[index + 1] File "/home/ayg/workspace/crnn/lib/dataset.py", line 59, in getitem return self[index + 1] [Previous line repeated 184 more times] File "/home/ayg/workspace/crnn/lib/dataset.py", line 44, in getitem assert index <= len(self), 'index range error' AssertionError: index range error

在报错之前出现很多错误的图片: Corrupted image for 1628 Corrupted image for 1630 Corrupted image for 1632 Corrupted image for 1634 ...

错误信息来源: def _process_next_batch(self, batch): self.rcvd_idx += 1 self._put_indices() if isinstance(batch, _utils.ExceptionWrapper):

make multiline KeyError msg readable by working around

        # a python bug https://bugs.python.org/issue2651
        if batch.exc_type == KeyError and "\n" in batch.exc_msg:
            raise Exception("KeyError:" + batch.exc_msg)
        else:
            raise batch.exc_type(batch.exc_msg)
    return batch

有人遇到这个问题吗?是否和Corrupted image 的出现有关?

wjx-git commented 5 years ago

原因就是由于Corrupted image导致的。 在创建lmdb格式数据时,需要将str类型数据转成bytes才行,所以我把原文中得到代码改成下面这样, def writeCache(env, cache): with env.begin(write=True) as txn: for k, v in cache.items():

图像名为bytes类型,而label为str类型,k为str类型,需要将str类型转为bytes类型

        if isinstance(v, bytes):
            txn.put(k.encode(), v)  # 添加数据和键值
        elif isinstance(v, str):
            txn.put(k.encode(), v.encode())

原来我的代码是: def writeCache(env, cache): with env.begin(write=True) as txn: for k, v in cache.items(): txn.put(k.encode(), str(v).encode()) # 添加数据和键值 但是V值有时候是bytes类型,有时候是str类型,上面做法会将bytes类型再次转换为bytes类型,导致后面训练时无法读取,才出现Corrupted image