KBNLresearch / ochre

Toolbox for OCR post-correction
Apache License 2.0
122 stars 18 forks source link

About OCR_aligned and Lost or missing text #11

Open USTCHJY opened 6 years ago

USTCHJY commented 6 years ago

Hi, I'm working on the OCR post-correction tasks and Ochre really helps me a lot. But I still have some questions looking forward to your reply. When using the Ochre for OCR post-correction tasks,we only have the OCR_input . So how can I get OCR_aligned from OCR_input without gs? Otherwise,how to deal with the Lost or missing text without aligned text? Thanks!

jvdzwaan commented 6 years ago

The task ochre performs is a supervised machine learning task. So, without gold standard, you can't create aligned data or train a (supervised) model.

USTCHJY commented 6 years ago

Sorry,maybe I expressed not clearly. I mean after supervised training(for training data,we must have gold standard),how can I use this trained ochre model for actual OCR post-correction tasks? Because for actual tasks,we usually don't have gold standard and desire to get corrected text which similiar to the gold standard. On this occasion,how can I get OCR_aligned from the raw OCR_input of the actual tasks? Thanks!

jvdzwaan commented 6 years ago

The README specifies how to use a trained model to do post correction: https://github.com/KBNLresearch/ochre#ocr-post-correction

If you want to calculate performance for this text, you'd still need to have ground truth/gold standard.