Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.
MIT License
377 stars 105 forks source link

create_dataset.py #15

Closed niddal-imam closed 5 years ago

niddal-imam commented 5 years ago

I am tying to use MJSynth 90 K, which continues millions of images. However, when I try to create dataset, only about 500K images can created. Is there a way of increasing this number?

Holmeyoung commented 5 years ago

Hi, the program is designed to create dataset of 999999999 max.

imageKey = 'image-%09d' % cnt
labelKey = 'label-%09d' % cnt

And the lmdb file can be 1 TB size.

env = lmdb.open(outputPath, map_size=1099511627776)

So, i guess maybe you run out of the disk space.

Use df path/to/your/file/folder to see the mount location. image

Use df -h to see the details. image

If there are still problems, we can solve it together. Good luck~

niddal-imam commented 5 years ago

I checked the disk space. image

I am still getting this error:

Written 620000 / 802734 Traceback (most recent call last): File "tool/create_dataset.py", line 125, in createDataset(args.out, image_path_list, label_list) File "tool/create_dataset.py", line 62, in createDataset if not checkImageIsValid(imageBin): File "tool/create_dataset.py", line 14, in checkImageIsValid imgH, imgW = img.shape[0], img.shape[1] AttributeError: 'NoneType' object has no attribute 'shape'

Holmeyoung commented 5 years ago

Hi, i have fixed it. image

By the way, you can print the image path while the program throws the error. Maybe it's a txt, json...etc or any other not image file exists in your folder.

niddal-imam commented 5 years ago

Perfect.

Many thanks