Aria-K-Alethia / laughter-synthesis

Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" accepted by INTERSPEECH 2023.
MIT License
70 stars 5 forks source link

How can we synthesize specific laughter? #2

Closed duj12 closed 1 year ago

duj12 commented 1 year ago

I noticed that in tlm sample, you used a data/prompt.txt, is this file a sequence of discrete units of laughter?

I am wondering how can we synthesize specific laughter, such as haahaahaa, hiihiihii, hnnhnnhnn, or heeheehee. If we use phonemes, we can just map the laughter into specific phonemes, and then train a specific TTS model to synthesize the laughter. By your tlm, we can only use a laughter audio to get a discrete units sequence by kmeans clusttering, and then to sample a new sequence with tlm, so we can get a laughter sequence similar to the original audio, and then use this sequence to TTS model to synthesize the laughter.

I don't know if I'm right on the usage of tlm. Or is there another way to synthesize specific laughter we want?

Aria-K-Alethia commented 1 year ago

I noticed that in tlm sample, you used a data/prompt.txt, is this file a sequence of discrete units of laughter?

Yes. If the prompt exists tlm will generate the continuation of the prompt.

Regarding your next question you can just find a laughter sample of the specific laughter you want to synthesize and use it as a prompt to generate it. You can of course also use phoneme to train the model, as long as you have enough labeled data.