CypherousSkies / reading-for-listeners

A deep-learning powered accessibility application which turns pdfs into audio files. Featuring ocr improvement and tts with inflection!
GNU Affero General Public License v3.0
23 stars 3 forks source link

French text special characters cause crashes #15

Closed CypherousSkies closed 2 years ago

CypherousSkies commented 2 years ago

Scans of french language texts have non-linguistic artifacts (like how english texts have ^L) Just gotta find them and add them to TextProcessor's filter

CypherousSkies commented 2 years ago

Nope. Well kinda. The problem is TTS.utils.synthesizer.Synthesizer.split_into_sentences only really working for english sentences for some reason. In the french text I'm testing it says that a whole paragraph (4k+ characters) is a single sentence. It's not. So time to override with nltk. Hopefully. On a day that's not today.

CypherousSkies commented 2 years ago

ok update it's not TTS's split into sentences. it likely has something to do with malformed headers and footers, as well as excessive spaces in the filtered text. Would likely be fixed by #9

CypherousSkies commented 2 years ago

Closing for being redundant with #9 (and so that I feel better about releasing 0.0.4)