Sometimes a PDF document we're uploading already has a layer of OCRed text. Currently Aleph OCRs the document again and the extracted text ends up with duplicates.
Ideally, we should provide a way to tell Aleph not to OCR a document while uploading through alephclient or the UI.
Sometimes a PDF document we're uploading already has a layer of OCRed text. Currently Aleph OCRs the document again and the extracted text ends up with duplicates.
Ideally, we should provide a way to tell Aleph not to OCR a document while uploading through alephclient or the UI.