CypherousSkies / reading-for-listeners

A deep-learning powered accessibility application which turns pdfs into audio files. Featuring ocr improvement and tts with inflection!
GNU Affero General Public License v3.0
23 stars 3 forks source link

Audio Artefacts? #8

Open CypherousSkies opened 2 years ago

CypherousSkies commented 2 years ago

The reader sometimes repeats themself in increasingly distorted ways. Why? What can I do to fix it?

CypherousSkies commented 2 years ago

Seems to be fixed by using tacotron2-DDC rather than tacotron2-DCA, but that doesn't answer why that is

CypherousSkies commented 2 years ago

Update: DDC is just less cursed about it than DCA. Still unclear why, could probably be fixed by fine tuning? Add it to the list of "for when I have a ML rig"

CypherousSkies commented 2 years ago

The issue only comes up when an especially short sentence is input and the decoder keeps running. Still not sure why, but that's Coqui's problem probably. Again, I might be able to fix it myself with some fine-tuning/setting the decoder length at the sentence level and not the text level.