ThomasDelteil / HandwrittenTextRecognition_MXNet

OCR using MXNet Gluon. The pipeline is composed of a CNN + biLSTM + CTC. The dataset is from: http://www.fki.inf.unibe.ch/databases/iam-handwriting-database. You need to register and get a username and password from their website.
128 stars 46 forks source link

an issues about your ocr data iteration #5

Open zzdang opened 6 years ago

zzdang commented 6 years ago

Hi,your project is cool ,but your OCR_LSTM_CTC's data iteration is very slow? Could you update it? Thank you very much

jonomon commented 6 years ago

Hi,

Would it be possible for you to include some clarifications? Which data iteration?

zzdang commented 6 years ago

Your "Data Loading" module isn't iterative, you load image data and labels first time. If the training dataset is big,the "images_data" in your "data loading" is hard to handle,the trainning will be very slow.......

ThomasDelteil commented 6 years ago

@zzdang, this is true, this was an acceptable trade off given the small size of the IAM dataset, and to get the ability to load pre-processed images quickly.

If your dataset is larger I would recommend using the ImageFolderDataset available in Gluon that would let you load each image only when necessary.

samar-smida commented 2 years ago

@jonomon @ThomasDelteil @ThomasDelteil I am testing your project but I have an assertion error even though I put my email and password in the credentials.json . In the registration form, they ask for the email and not the username. I found this link in the project to have but it is not functional https://fki.tic.heia-fr.ch/DBs/iamDB/iLogin/index.php please can you help me

jonomon commented 2 years ago

You can download the dataset https://fki.tic.heia-fr.ch/databases/iam-handwriting-database.