Closed Ca-ressemble-a-du-fake closed 1 year ago
Now that you mention it I realize that I never put this function somewhere in the repo, I only have it lying around as a script. I will turn it into a proper script that's easy to use and include it in the toolkit when I find the time. It works by using the aligner, calculating which frames belong to which utterance and then making a split in the waveform following the ratio calculated using the spectrogram frames and the aligner. It's at the same time simple and a brain twister to think about.
You have to segment the text manually by speaker, but the cutting of the audio happens automatically.
Thanks for your answer. Looking forward to testing it!
Hi,
In the paper you talk about "Customization of Voice-Acting" and in the demo about "Audiobook to Audioplay". But is this working automatically because I could not find the function that splits the text into segments of speaker's text ? Or you did it all manually ?
Thanks in advance for shedding lights on this :smile: !