Add Support for laughter annotation in Fine-Tuning with a special token [Feature request]

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Mozilla Public License 2.0

31.64k stars 3.78k forks source link

Hello Coqui-AI team,

I would like to request a new feature that would greatly enhance the training process for capturing non-verbal sounds, such as laughter, in transcribed conversations. Specifically, my suggestion is to implement a special token or keyword, such as [laugh], that can be used during fine-tuning to denote instances of laughter in the audio data.

For instance, if a person laughs in an audio file, the transcription could include the special token [laugh] at the appropriate point. This way, when the model is fine-tuned, it learns to recognize and reproduce laughter in the synthesized speech.

Thank you for considering this request. You are doing a fantastic job.

coqui-ai / TTS

Add Support for laughter annotation in Fine-Tuning with a special token [Feature request] #3760