luyun760324 commented 4 years ago

Corrupted image for 1 Traceback (most recent call last): File "/home/wdj/mycode3/Lets_OCR/recognizer/crnn/lib/dataset.py", line 132, in getitem img = Image.open(buf).convert('L') File "/root/anaconda3/envs/Pytorch_CRNN3/lib/python3.6/site-packages/PIL/Image.py", line 2821, in open raise IOError("cannot identify image file %r" % (filename if filename else fp)) OSError: cannot identify image file <_io.BytesIO object at 0x7f6110132308>

ShangLe0607 commented 4 years ago

我之前也出现过这个错误，你的标签是数字还是中文

luyun760324 commented 4 years ago

你的问题解决没，我的标签是数字

huitang commented 4 years ago

我也遇到这个问题我的标签是中文

Ryansanity commented 4 years ago

你的问题解决没，我的标签是数字

您好，我的标签是中文对应的数字，但是也还是会出现上述情况，请问大佬是什么原因呢

paohaijiao commented 4 years ago

请问解决了吗

oweiii commented 4 years ago

请问解决了吗我中英文的标签都试过了

htyquq commented 10 months ago

我猜测你在读取图像时使用： with open(imagePath, 'rb') as f: #这里要用rb打开图片 imageBin = f.read() 在创建数据集时使用了下列代码： with env.begin(write=True) as txn: for k, v in cache.items(): txn.put(str(k).encode(), str(v).encode()) 这里错误的将bytes数据再次encode，正常encode是无法编码bytes类型的，decode后的内容虽然一样，但是一个是str一个是bytes buf = six.BytesIO() #创建一个内存地址 buf.write(imgbuf) #写入图片二进制数据 buf.seek(0) #File.seek(1) File.seek(2) 0指针回到文件开头 1当前位置 2文件结尾

对一个空文件写后再读时候，应在写完之后seek(0),使指针回到文件开头以便再读

        try:
            img = Image.open(buf).convert('L')

这里的imgbuf必须是bytes类型才能打开我修改后的代码如下，对于图像数据不编码即可 for k, v in cache.items(): if 'image' in k: txn.put(str(k).encode(),v) else: txn.put(str(k).encode(), str(v).encode()) #这里写入的是编码后的数据，读取需decode或其他解码

AstarLight / Lets_OCR

训练时Corrupted image errror，字节对象进lmdb，lmdb读出来的不一样，请高手帮解决，急。 #75

对一个空文件写后再读时候，应在写完之后seek(0),使指针回到文件开头以便再读