awslabs / handwritten-text-recognition-for-apache-mxnet

This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.
Apache License 2.0
481 stars 189 forks source link

FileNotFound #49

Open axaygaid opened 4 years ago

axaygaid commented 4 years ago

Hello guys, i have i think a simple problem : when i launch test_iam_dataset i have this error :

FileNotFoundError: [Errno 2] File /home/roo/sf_workspace/Image Médecine douce/handwritting model/handwritting notebook/ocr/utils/../../dataset/iamdataset/subject/trainset.txt does not exist: '/home/roo/sf_workspace/Image Médecine douce/handwritting model/handwritting notebook/ocr/utils/../../dataset/iamdataset/subject/trainset.txt'

I don't know what kind of file is it

If someone has an idea, thank's a lot !

jonomon commented 4 years ago

Hi @axaygaid Thank you for bringing this to my attention.

Could you try to replace to following cell:

ds = IAMDataset("word", output_data="text")
plot_image_with_text(ds)

to

ds = IAMDataset("word", output_data="text", root="../../dataset/iamdataset")
plot_image_with_text(ds)

and see if it solves your issue?

axaygaid commented 4 years ago

Hi jonomon;

thank's for the answer, it didn't work, same issue .. I think the trainset.txt file can't be downloaded, don't know why... i download the different library manually (security problem) then i put it in the dataset/iamdataset folder so i have all the iamdataset but not the other file as trainset i think ?

jonomon commented 4 years ago

You should place trainset.txt (as well as testset.txt etc.) in data/iamdataset/subject.

Please let me know if this works.

axaygaid commented 4 years ago

but what kind of data i have to put in the two txt file ? because i tried it before and the error was :

EmptyDataError: No columns to parse from file

So i have to put some data on it

thank's !

jonomon commented 4 years ago

You should download the files here http://www.fki.inf.unibe.ch/DBs/iamDB/tasks/largeWriterIndependentTextLineRecognitionTask.zip

axaygaid commented 4 years ago

hey jonomon

thank's for the help, it was helpful... now running : test_ds = IAMDataset("form_original", train=False) "works" but when i try to plot an image i have nothing, like nothing is read ? and when i try i simple : len(ds) to check if there is something in it and it returns just 0... i'm checkin on the source if something is missing in my setting but if someone has any idea...

thank's a lot!

jonomon commented 4 years ago

Hi @axaygaid,

It is hard for me to debug the issue without any information. What is the contents of data/iamdataset?

axaygaid commented 4 years ago

Hi @jonomon

A simple example is that when i run : ds = IAMDataset("word", output_data="text") * that give this : <_io.TextIOWrapper name='/home/roo/sf_workspace/Image Médecine douce/handwritting model/handwritting notebook/ocr/utils/credentials.json' mode='r' encoding='UTF-8'> so it's the good path and len(ds) the output is 0 and in iamdataset folder : `os.listdir("/home/roo/sf_workspace/Image Médecine douce/handwritting model/handwritting notebook/dataset/IAMDataset/") :

['.ipynb_checkpoints', 'ascii.gz', 'forms.txt', 'formsA-D.tgz', 'formsE-H.tgz', 'formsI-Z.tgz', 'image_data-form_original-text0.plk', 'image_data-form_original-text1.plk', 'image_data-form_original-text2.plk', 'image_data-form_original-text3.plk', 'image_data-word-text0.plk', 'image_data-word-text1.plk', 'image_data-word-text2.plk', 'image_data-word-text3.plk', 'largeWriterIndependentTextLineRecognitionTask.zip', 'lines.tgz', 'lines.txt', 'sentences.tgz', 'sentences.txt', 'subject', 'untitled.txt', 'words.tgz', 'words.txt', 'xml', 'xml.tgz']

i don't if it's clear now ? ><

jonomon commented 4 years ago

It seems like the contents is missing a bunch of folders. See the example here. image

The IAMDataset class should automatically download the IAM dataset and process the files. Was there something wrong with that step?

axaygaid commented 4 years ago

i download the different dataset (word, forms etc..) manually because i have a protection, i can't download directly big file such as IAMDataset, that's why the processing is not made i think but after the download, i extract every file and put in folder (all the .png of the form in form...) i thounght it could be enough

image

that's the iamdataset folder ... maybe i have to preprocess by myself if it doesn't work ... i have the same problem, i mean that the pipeline doesn't recognize the different picture :/

Thank's if you have an idea,.. :)

jonomon commented 4 years ago

So you are not using the IAMDataset? If that's the case, you would have to customise the Gluon Dataset to your dataset.

This documentation provides information for it https://mxnet.apache.org/api/python/docs/tutorials/packages/gluon/data/datasets.html

mahin003 commented 3 years ago

If anybody executed it on Google colab ,please sharethe edited iam_dataset.py it with me , mahinqureship1@gmail.com

JPremnath06 commented 3 years ago

If anybody executed it on Google colab ,please sharethe edited iam_dataset.py it with me , mahinqureship1@gmail.com

Please share the iam_dataset.py file with me. (to use in colab). jpremnath06@outlook.com

sambbhavgarg commented 3 years ago

Hey @jonomon, first off, thanks a lot for this repo.

There seems to be an issue in accessing the largeWriterIndependentTextLineRecognitionTask.zip file at http://www.fki.inf.unibe.ch/DBs/iamDB/tasks/largeWriterIndependentTextLineRecognitionTask.zip (E404)

Could you update the latest link in the code/point us to the file so it can be downloaded manually?

Thanks, Sambbhav

jonomon commented 3 years ago

Hi @sambbhavgarg You can download it here https://fki.tic.heia-fr.ch/static/zip/largeWriterIndependentTextLineRecognitionTask.zip.

Regards, Jonathan