Jyouhou / UnrealText

Synthetic Scene Text from 3D Engines
MIT License
241 stars 39 forks source link

some imgs seem to be corrupted #29

Closed aiboys closed 2 years ago

aiboys commented 2 years ago

Hi, Thanks for your open source dataset. But when I use Unrealtext dataset, I found some imgs corrupted. I referenced mmocr source code to pretrain my model, but it seems loading img meet error:

            with open(img_name, 'rb') as f:
                img_buff = f.read()
            img = mmcv.imfrombytes(img_buff, IMREAD_COLOR)

accidently, above code would meets error: cv2.error: OpenCV(4.1.2) /io/opencv/modules/imgcodecs/src/loadsave.cpp:730: error: (-215:Assertion failed) !buf.empty() in function 'imdecode_'. It seems the img file is corrupted. But I am not sure which one or ones corrupted (Debugging and finding)

Jyouhou commented 2 years ago

Can you try:

try:
  with open(img_name, 'rb') as f:
    img_buff = f.read()
  img = mmcv.imfrombytes(img_buff, IMREAD_COLOR)
except:
  print(img_name)

to locate the potentially corrupted images?

aiboys commented 2 years ago

Can you try:

try:
  with open(img_name, 'rb') as f:
    img_buff = f.read()
  img = mmcv.imfrombytes(img_buff, IMREAD_COLOR)
except:
  print(img_name)

to locate the potentially corrupted images?

I have found the following imgs corrupted: in sub_121 image I am not sure whether my decompressing meets some mistakes.

Jyouhou commented 2 years ago

It seems like these images are empty. Probably due to some error in the compression and upload process. Please remove these images.