factful / ocr_testing

Scripts and results from our OCR roundup, available on Source
https://source.opennews.org/articles/so-many-ocr-options/
150 stars 46 forks source link

Address Problem: different tools handle all of this differently #14

Closed amandabee closed 5 years ago

amandabee commented 5 years ago

As noted on #13 "Just noting again that the different tools handle all of this differently. Google, Azure, Abbyy and Tesseract all automatically rotate the pages."

This seems to reflect a larger concern than the fairly straightforward task of re-running the now-rotated Yanukovych document through Ocropus and Calamari.

Where and how should this be addressed?

knowtheory commented 5 years ago

This perhaps reflects a lack of clarity on my part on what the baseline for comparison should be.

Is it that we're presenting best case scenarios where the data and input are as clean as can be (to demonstrate just the differences in the recognition engines), or if we're comparing the way that the tools assist users in handling the processing that's necessary to get workable results back?

knowtheory commented 5 years ago

I added a sentence to the OCRopus section and two to the Attention-OCR section noting that they're not batteries included tools.