How can we synthesize specific laughter?

Aria-K-Alethia / laughter-synthesis

Official implementation of the paper "Laughter Synthesis using Pseudo Phonetic Tokens with a Large-scale In-the-wild Laughter Corpus" accepted by INTERSPEECH 2023.

MIT License

70 stars 5 forks source link

I noticed that in tlm sample, you used a data/prompt.txt, is this file a sequence of discrete units of laughter？

I am wondering how can we synthesize specific laughter, such as haahaahaa, hiihiihii, hnnhnnhnn, or heeheehee. If we use phonemes, we can just map the laughter into specific phonemes, and then train a specific TTS model to synthesize the laughter. By your tlm, we can only use a laughter audio to get a discrete units sequence by kmeans clusttering, and then to sample a new sequence with tlm, so we can get a laughter sequence similar to the original audio, and then use this sequence to TTS model to synthesize the laughter.

I don't know if I'm right on the usage of tlm. Or is there another way to synthesize specific laughter we want?

Aria-K-Alethia / laughter-synthesis

How can we synthesize specific laughter? #2