CypherousSkies / reading-for-listeners

A deep-learning powered accessibility application which turns pdfs into audio files. Featuring ocr improvement and tts with inflection!
GNU Affero General Public License v3.0
23 stars 3 forks source link

Use OpenCV+TrOCR to improve results #13

Open CypherousSkies opened 2 years ago

CypherousSkies commented 2 years ago

This should remove the purpose of BERT altogether, although it might be pretty expensive. paper | code

CypherousSkies commented 2 years ago

Pushing back for until TrOCR has a transformers page

CypherousSkies commented 2 years ago

TrOCR now has a page! time to experiment! will need #9 to preprocess images chances are that this will be untenably slow on cpu and may have issues with non-english texts, but that's a future me problem