pdf2pdfocr changing languages

This wasn't included in the readme file but some info for anyone else lost. You can change the language model to download by editing this: aria2c "https://github.com/tesseract-ocr/tessdata/blob/main/por.traineddata?raw=true" --dir="%TESSDATA_PREFIX%" And change the language prefix to which language you want. As long as its available on the tesseract repo. For example here is Swedish - "swe": bild

Further info here: https://github.com/tesseract-ocr/tesseract/blob/main/doc/tesseract.1.asc#LANGUAGES

To change default language edit pdf2pdfocr.py on line 548 from Portuguese + English - "por+eng" to whichever. For me I use Swedish + English - "swe+eng" self.tess_langs = "por+eng" # Default to self.tess_langs = "swe+eng" # Default

For example to get Swedish bild

LeoFCardoso / pdf2pdfocr

pdf2pdfocr changing languages #36