KBNLresearch / ochre

Toolbox for OCR post-correction
Apache License 2.0
122 stars 18 forks source link

Working without aligned file #3

Open omrishsu opened 6 years ago

omrishsu commented 6 years ago

Hi I’m conducting research regarding OCR corpuses, and I would like to use this project for evaluation of how differences on the training corpus effects the quality of the post-processing. But, I have OCR files and GS files without the aligned JSON file that needed. There is a way to generate it (maybe a smith waterman algorithm?) or work without it?

Thanks Omri

jvdzwaan commented 6 years ago

Thank you for your interest in ochre! Whether you need the aligned files depends on what you want to do (how you want to calculate performance). For calculating character error rate and word error rate, you don't need them. For doing word level error analysis, you need them, but if you use the workflows provided by ochre, they are generated automatically.

I am in the process of putting the workflows online and providing documentation. So, I hope you can wait a little longer.

Is your dataset publicly available? If so, I'd like to include it in my list :)

omrishsu commented 6 years ago

Hi, Sorry for disappearing (working on another research). I've updated my question in a separate post: https://github.com/KBNLresearch/ochre/issues/4 Thanks!