18F / 2015-foia-hub

A consolidated FOIA request hub.
Other
49 stars 17 forks source link

Test OCR for quality #735

Closed geramirez closed 9 years ago

geramirez commented 9 years ago

Notes: I reconverted the 2300+ documents on keystone from State and Foiaonline, of these documents only 7 needed to be ocrd. All of the documents appeared to be scanned memos/emails and average around 610 extracted word. Nextsteps: test older documents.

geramirez commented 9 years ago

Accuracy + Speedup -- https://github.com/18F/doc_processing_toolkit/pull/8

geramirez commented 9 years ago

@rjmajma I've added documentation on the ORC methodology and sources that I researched.