With XTTS, sometimes the audio makes no sense

aedocw / epub2tts

Turn an epub or text file into an audiobook

Apache License 2.0

496 stars 45 forks source link

With XTTS, sometimes the audio makes no sense #67

Closed aedocw closed 9 months ago

aedocw commented 9 months ago

Sometimes when using XTTS, one sentence group/chunk will sound like nonsense. I can't reproduce it at will, but it has come up a few times in one of the first long books I created using XTTS.

aedocw commented 9 months ago

Good solution - use "whisper" (https://github.com/openai/whisper) to transcribe each audio chunk after it's done, and compare the transcript to the original text. Comparison could be kind of fuzzy, since things like names may be spelled differently but pronounced the same way. "fuzzywuzzy" would be a good library for this. If the comparison between original and transcript is below some threshold, try to re-encode that chunk again.