ciur / papermerge

Open Source Document Management System for Digital Archives (Scanned Documents)
https://papermerge.com
Apache License 2.0
2.41k stars 257 forks source link

Exclude document from OCR #598

Closed thndrbck closed 4 months ago

thndrbck commented 4 months ago

Forms filled in by hand don't need Optical Character Recognition. The OCR database would fill up with form field labels. Also, disk storage will fill up with unnecessary OCR duplicates.

If you could include a check box when uploading a file so that it is marked for no OCR, that would be helpful. A toggle to turn off OCR when batch uploading documents would also be helpful.

ciur commented 4 months ago

Thank you for opening this ticket!

This feature makes perfect sense and it is relatively easy to implement. Will be implemented as part of next release 3.1, which will be out in couple of weeks.

thndrbck commented 4 months ago

Re: Did you meant here exclude entire document from being OCRed - which is exactly as https://github.com/ciur/papermerge/issues/598 ?

Or did you really meant to exclude specific pages from being OCRed ? In last case, i.e. when you mean to exclude specific pages from OCRed - it is not possible to implement. It is either entire document (i.e. all pages in the document) or nothing.


I meant not OCRing the entire document.

ciur commented 4 months ago

Added PR#332

Feature will be part of the 3.1.0 release.