eikek / docspell

Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
https://docspell.org
GNU Affero General Public License v3.0
1.65k stars 127 forks source link

Add more languages #2779

Open tiborrr opened 2 months ago

tiborrr commented 2 months ago

I currently use the following languages in another application . I think it would be good to add those because then we cover the entirety of Europe. I have invoices from all of Europe. Would like to know your thoughts on this. Perhaps we should think of a way to dynamically add more languages as seen fit by the user as for a default this would increase the image with about 14 * approximately 8 MB = 112 MB (installed size)

+   tesseract-ocr-data-bel \
+   tesseract-ocr-data-bos \
+   tesseract-ocr-data-bul \
    tesseract-ocr-data-ces \
    tesseract-ocr-data-dan \
    tesseract-ocr-data-deu \
+   tesseract-ocr-data-ell \
    tesseract-ocr-data-eng \
    tesseract-ocr-data-est \
    tesseract-ocr-data-fin \
    tesseract-ocr-data-fra \
    tesseract-ocr-data-heb \
+   tesseract-ocr-data-hrv \
+   tesseract-ocr-data-hun \
+   tesseract-ocr-data-isl \
    tesseract-ocr-data-ita \
    tesseract-ocr-data-jpn \
+   tesseract-ocr-data-kat \
    tesseract-ocr-data-lav \
    tesseract-ocr-data-lit \
+   tesseract-ocr-data-ltz \
+   tesseract-ocr-data-mkd \
+   tesseract-ocr-data-mlt \
    tesseract-ocr-data-nld \
    tesseract-ocr-data-nor \
    tesseract-ocr-data-pol \
    tesseract-ocr-data-por \
    tesseract-ocr-data-ron \
    tesseract-ocr-data-rus \
    tesseract-ocr-data-slk \
+   tesseract-ocr-data-slv \
    tesseract-ocr-data-spa \
+   tesseract-ocr-data-srp \
    tesseract-ocr-data-swe \
+   tesseract-ocr-data-tur \
    tesseract-ocr-data-ukr \
lonesomewalker commented 5 days ago

How about tesseract-ocr-all ?