KBNLresearch / ochre

Toolbox for OCR post-correction
Apache License 2.0
122 stars 18 forks source link

Is test and training data format different. #19

Open tejakundaikar opened 3 years ago

tejakundaikar commented 3 years ago

Request to provide a sample test data format

jvdzwaan commented 3 years ago

The data format is specified in the README and should be the same for training and test.

tejakundaikar commented 3 years ago

Thank you for you reply

I have given the test data using following python3 -m ochre.lstm_synced_correct_ocr model/0.1861-40.hdf5 gs.json ocr.json Here ocr.json file contains data in format { "ocr":["\u2018", "\u0930", "\u0947", "\u0921", "\u093f", "\u092f", "\u094b"]}

Further I tried to give test data using txt files. python3 -m ochre.lstm_synced_correct_ocr model/0.1861-40.hdf5 gs.txt ocr.txt

In both ways I am getting the below error
ValueError: Input 0 is incompatible with layer sequential: expected shape=(None, None, 171), found shape=[None, 25, 3825]

Can you please help me to give test data to the model.

jvdzwaan commented 3 years ago

My guess is that the error is due to updates in the dependencies and not necessarily to problems with the data. This can be expected from software that isn't actively being developed.