Adding controlled pauses to the generated speech

showgan commented 7 months ago

Hi,

Is it possible to add specific pauses with various length, e.g. a short pause or a long pause, to the generated speech by means of adding special characters or tokens? If there is no such feature then can someone point me to the relevant code where I can try and add such an enhancement (in my cloned repo)?

The backgound for this question is that, for instance, I would like to generate speech from a few sentences of text which represent a diaglog between two or more persons. If I end each sentence with a period then that creates some pause but often it's not enough to empasize that the next part of the speech is spoken by a different person, so I would like to be able to insertt longer pauses.

Thanks!

Flux9665 commented 7 months ago

Adding a new symbol is unfortunately not so easy, but you can control the duration of every sound manually by overwriting the durations that the duration predictor predicts.

If the pauses should happen inbetween speaker turns, you could also just add them manually when you stitch the utterances together, that might be the easiest way of dealing with this.

showgan commented 6 months ago

Thanks for the suggestions. I actually also need pauses in the middle of one sentence of a single speaker in some cases. I was able to control the length of a pause by changing code in Preprocessing/TextFrontend.py. I basically disabled code that collapsed multiple "\~" into a single one, e.g.: phones = re.sub("\~+", "\~", phoneme_string)

This works fine to some extent, but if I try to add very long pauses then I get undesired sounds at the end of the pause. I wondet if this can be mitigated.

DigitalPhonetics / IMS-Toucan

Adding controlled pauses to the generated speech #163