Open JaviCru opened 1 month ago
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Hello Coqui-AI team,
I would like to request a new feature that would greatly enhance the training process for capturing non-verbal sounds, such as laughter, in transcribed conversations. Specifically, my suggestion is to implement a special token or keyword, such as [laugh], that can be used during fine-tuning to denote instances of laughter in the audio data.
For instance, if a person laughs in an audio file, the transcription could include the special token [laugh] at the appropriate point. This way, when the model is fine-tuned, it learns to recognize and reproduce laughter in the synthesized speech.
Thank you for considering this request. You are doing a fantastic job.