OCR - Githubissues

ArtifexSoftware / mupdf.js

JavaScript bindings for MuPDF

https://mupdfjs.readthedocs.io

GNU Affero General Public License v3.0

312 stars 17 forks source link

OCR #51

Open gsemyong opened 2 months ago

gsemyong commented 2 months ago

Haven't found any reference of OCR in the mupdf.js docs, but see that tesseract is mupdf's optional dependency. Is there an option do OCR using mupdf.js?

jamie-lemon commented 2 months ago

There is no option for OCR - this would add considerable megabytes to the codebase, something we can't afford for web runtime. OCR depends on a big and heavy set of libraries and also needs per-language training data files that we need to provide it access to. Possibly at some point we might consider a plugin for OCR, but this would have to be a separate project.