coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
31.64k stars 3.78k forks source link

Add Support for laughter annotation in Fine-Tuning with a special token [Feature request] #3760

Open JaviCru opened 1 month ago

JaviCru commented 1 month ago

Hello Coqui-AI team,

I would like to request a new feature that would greatly enhance the training process for capturing non-verbal sounds, such as laughter, in transcribed conversations. Specifically, my suggestion is to implement a special token or keyword, such as [laugh], that can be used during fine-tuning to denote instances of laughter in the audio data.

For instance, if a person laughs in an audio file, the transcription could include the special token [laugh] at the appropriate point. This way, when the model is fine-tuned, it learns to recognize and reproduce laughter in the synthesized speech.

Thank you for considering this request. You are doing a fantastic job.

stale[bot] commented 2 days ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.