OCR4all / LAREX

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.
MIT License
179 stars 33 forks source link

Text recognition #244

Closed IlaCode closed 3 years ago

IlaCode commented 3 years ago

I am trying to recognize the text of a historical book. After segmenting regions and selecting lines of text, doesn't it automatically recognize text? Can I just manually type the text? The book has 100 pages and I would like to speed up the process.

maxnth commented 3 years ago

I am trying to recognize the text of a historical book. After segmenting regions and selecting lines of text, doesn't it automatically recognize text?

There's OCR4all for that. OCR4all uses LAREX for certain workflow steps (region segmentation, post correction, ...) but additionally allows text prediction and model training.

You can use the PAGE XML files you created using LAREX.

IlaCode commented 3 years ago

Thanks a lot. Anyway I wanted to inform you that I followed the vbox installation guide, on windows 10 I can't reach the url (err_connection__timeout), but it works perfectly on my windows 7. See you soon!

maxnth commented 3 years ago

Anyway I wanted to inform you that I followed the vbox installation guide, on windows 10 I can't reach the url (err_connection__timeout), but it works perfectly on my windows 7. See you soon!

Do other VBoxes (which also are accessible from the host through e.g. port forwarding) work for you on Windows 10?

maxnth commented 3 years ago

Feel free to reopen if the problem persists.