Eric013 / isri-ocr-evaluation-tools

Automatically exported from code.google.com/p/isri-ocr-evaluation-tools
0 stars 0 forks source link

Wrapper script to use tools with UTF-8 input files #2

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
Attached is a wrapper script to make the tools here run correctly with UTF-8. 
Of course it would be much better to make them use UTF-8 internally directly, 
but that's quite a big job, and this script works perfectly well.

Use it like this:

  utf8toolwrap.sh accuracy ground.txt ocr.txt # run the 'accuracy' tool

It probably isn't needed for tools like accsum that just postprocess the output 
from the real tests, but certainly 'accuracy', the most important, is suddenly 
useful with it!

Original issue reported on code.google.com by nick.wh...@durham.ac.uk on 23 Feb 2013 at 11:12

GoogleCodeExporter commented 8 years ago
I meant to mention, I recommend this be added to the source repository, as it's 
very useful indeed :)

Original comment by nick.wh...@durham.ac.uk on 23 Feb 2013 at 11:15

GoogleCodeExporter commented 8 years ago
Attaching an updated version of the script, which doesn't choke on input files 
in subdirectories (oops)

Original comment by nick.wh...@durham.ac.uk on 24 Feb 2013 at 12:07

GoogleCodeExporter commented 8 years ago

Original comment by nick.wh...@durham.ac.uk on 24 Feb 2013 at 12:11

Attachments:

GoogleCodeExporter commented 8 years ago
I just forked this codebase and added the wrapper patch (renamed to 
ocrtoolutf8): 
https://gitorious.org/ancient-greek-training-for-tesseract/ocr-evaluation-tools

Original comment by nick.wh...@durham.ac.uk on 27 Feb 2013 at 11:31