Dataset takes up only 1000 images

Holmeyoung / crnn-pytorch

Pytorch implementation of CRNN (CNN + RNN + CTCLoss) for all language OCR.

MIT License

378 stars 105 forks source link

Dataset takes up only 1000 images #49

Closed NavneetSajwan closed 4 years ago

NavneetSajwan commented 4 years ago

Screenshot from 2020-04-24 08-03-47 I have 1740 images. I tried to make a dataset using them. But it only chooses 1000 of all the images.

ghost commented 4 years ago

No. it created 1740 samples. please read the codes in create_dataset.py. it just display the process every 1000 samples.

if cnt % 1000 == 0:
        writeCache(env, cache)
        cache = {}
        print('Written %d / %d' % (cnt, nSamples))
        cnt += 1

ghost commented 4 years ago

@NavneetSajwan

train_dataset = lmdbDataset('path_to_your_created_lmdb')
print(len(train_dataset))

you will see the number of images

NavneetSajwan commented 4 years ago

thanks.

NavneetSajwan commented 4 years ago

How many epochs does it take to train on this data? I have done 500 epochs and still, I have just 12% accuracy.

ghost commented 4 years ago

you data is far from enough. I used http://www.robots.ox.ac.uk/~vgg/data/text/ Number of training images: 7,224,600 Number of validation images: 802,733 7 epoch can achieve more than 88% accuracy on validation set.