aedocw / epub2tts

Turn an epub or text file into an audiobook
Apache License 2.0
426 stars 43 forks source link

Pause between sentences #208

Closed winfran closed 4 months ago

winfran commented 4 months ago

Is it possible to change the pause length between sentences? I find that very often, with xtts, there is almost no pause between two sentences.

I'm using the reference voice from https://huggingface.co/spaces/coqui/xtts

aedocw commented 4 months ago

Unfortunately with Coqui TTS there's no way that I am aware of to specify how long pauses should be between sentences.

Another user has shared some example code that breaks up the source material into individual sentences. Going from there it would be possible to create a wave file per sentence, and then insert a variable silence between each sentence while doing the combine operation for each chapter.

There was also this comment the other day that appears to achieve the same thing. I'll take a closer look at that in the next few days and give it a try. I do agree with both of you that putting in a consistent pause between each sentence would be a big improvement.

winfran commented 4 months ago

Thank you for giving it a go, can't wait to see how it turns out!

aedocw commented 4 months ago

There's a branch linked to this issue that incorporates a pause of 0.6 seconds after each detected sentence. I've only tested it on a relatively small sample with 11 sentences, but it works nicely. I'm going to give it a try on a full book and see how it turns out before merging into main.

If you are able to, you could check out that branch and give it a try yourself. If you can do that please report back here and let us know what you think.

winfran commented 4 months ago

Thank you so much for making the changes! It does sound much more fluid now. However, I've looked at the code on line 190 that sets the silence duration and experimented a bit with various numbers. In the runs I did, .7 was the most intelligible. With .6 I sometimes lose track of what its reading. With .8, it sometimes introduces longer pauses in the middle of a sentence, typically at a comma or near an "and", shifting also a bit the intonation as if a new sentence were started.

Another observation, which is independent of the pause length parameter - whenever there is a sentence longer than 240 notepad columns (incl. white spaces), one of two things happens: a) there is an abrupt end to the sentence around column nr. 240 and the rest of the text is read as if a new sentence is beginning, thereby producing a non-natural intonation b) the voice starts to mumble something or repeats the last two-three words, is silenced and then continues reading as though from the middle of the sentence (without changing intonation) It seems as though coqui is unable to read sentences longer than 240 and force cuts them at this point. To prevent this from braking the flow of the text, maybe it would be possible to split the sentence at the first comma or before the first "and" that precedes column for ex. 230? (I'd suggest not waiting until it hits its buffer of approx. 240, but splitting before 230 or 220 to give a bit of leeway to the model).

And a final, third, observation - I notice that it frequently doesn't read the final letters (usually "s" and "t") of the sentence. Is this coqui related, or do you have code in epub2tts that somehow silences the voice at the end of the sentence?

The key message though is: great job on this update! It very much improves the quality and legibility/understandability of the text being read.