Closed deajan closed 1 week ago
Yes, papermerge uses tesseract.
Is there any plugin system / something else where to "plugin" another OCR engine, given that it handles hOCR ?
Well, not really. There is no official "plug-in system". But coupling with Tesseract is very thin and it is easy to add support for almost any OCR engine.
Basically, OCR part is separate application, called OCR-worker, which is connected with main app only via celery messages.
The whole dependency on OCR engine is just this module: https://github.com/papermerge/ocr-worker/blob/main/ocrworker/ocr.py (of course I don't count system dependencies, which are assumed present in dockerimage) The entrypoint of the OCR are in tasks.py module
Side question: Does papermerge handle user/group permissions ? If so, can they be assigned automagically for new documents, according to tags or something alike ?
Well, yes and no.
Yes. Papermerge handles user/groups/permissions, but not in sense you probably mean.
Your question, I guess, is about permissions per object/resource (in this sense, specific document or folder). No. Per object/resource/folder/document permissions are not there yet. I will add them at the beginning of 2025.
Thank you for your quick reply. I've worked with OCRMyPDF to make EasyOCR work under celery and headless I guess this work would render it compatible with papermerge.
Would you mind to shortly explain the permission system in papermerge ? My usecase is sharing documents with my family:
Is that something I can achieve with Papermerge easily ?
My usecase is sharing documents with my family: .... Is that something I can achieve with Papermerge easily ?
No. Not now. Currently permissions are there to limit users to specific URLs (the technical term is "endpoints").
In other words, currently you can say: "user coco does not have permissions to access GET /groups/, POST /groups/, GET /groups/
As I mentioned above, per object permissions, this is your case when you try to grant access to specific folder or document, will come soon - beginning of 2025 (I think it will be February, 2025)
Thank you for the insight :) I'll see if I can chip in a bit time to integrate EasyOCR into papermerge, since it's results are generally superior to tesseract.
Hello,
I currently tried paperless-ngx and found it to not fit my usecases. Mostly, I've spend some time developping support for EasyOCR for paperless-ngx, only to find out that the developpers aren't fond of supporting alternative OCR engines.
As far as I understood, papermerge uses tesseract ? Is there any plugin system / something else where to "plugin" another OCR engine, given that it handles hOCR ?
Thanks. Side question: Does papermerge handle user/group permissions ? If so, can they be assigned automagically for new documents, according to tags or something alike ?