OregonDigital / oregondigital_2

The active development on Oregon Digital 2 is in the https://github.com/OregonDigital/OD2 repo.
Other
1 stars 1 forks source link

Feature/index full text #251

Closed straleyb closed 9 years ago

straleyb commented 9 years ago

Fixes #135, Fixes #230

This adds a new group of objects that allows for parsing OCR documents into pages and words. This is used by the document model to append the full text of a document to the solr document to add searching capabilities.

straleyb commented 9 years ago

This should be good to go now.

tpendragon commented 9 years ago

@straleyb https://coveralls.io/builds/3306805/source?filename=lib%2Foregon_digital%2Fderivatives%2Fprocessors%2Focr_processor.rb

straleyb commented 9 years ago

@terrellt Should I add a test to check that the ocr processor correctly displays an error when it fails pdftotext? Im slightly confused at what you are pointing to.

straleyb commented 9 years ago

Alright coverage is fixed.