DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!
Apache License 2.0
1.4k stars 158 forks source link

How does Audiobook to Audioplay work ? #144

Closed Ca-ressemble-a-du-fake closed 1 year ago

Ca-ressemble-a-du-fake commented 1 year ago

Hi,

In the paper you talk about "Customization of Voice-Acting" and in the demo about "Audiobook to Audioplay". But is this working automatically because I could not find the function that splits the text into segments of speaker's text ? Or you did it all manually ?

Thanks in advance for shedding lights on this :smile: !

Flux9665 commented 1 year ago

Now that you mention it I realize that I never put this function somewhere in the repo, I only have it lying around as a script. I will turn it into a proper script that's easy to use and include it in the toolkit when I find the time. It works by using the aligner, calculating which frames belong to which utterance and then making a split in the waveform following the ratio calculated using the spectrogram frames and the aligner. It's at the same time simple and a brain twister to think about.

You have to segment the text manually by speaker, but the cutting of the audio happens automatically.

Ca-ressemble-a-du-fake commented 1 year ago

Thanks for your answer. Looking forward to testing it!