dglazkov / polymath

MIT License
132 stars 9 forks source link

Add an OCR importer #86

Closed PaulKinlan closed 1 year ago

PaulKinlan commented 1 year ago

I have a bunch of images (contracts etc) that have a lot of text in that are not accessible and I would like to include in my library.

This PR uses tesseract (it assumes that it is installed) to get the text from page and then include it in a library.

I've not yet worked out what to do with the URL.