Closed rodrigoGA closed 8 months ago
Any update on this?
Hi @rodrigoGA, thanks for your interest in NeMo TTS toolkit. If sticking to our current phoneme-based TTS models, such as FastPitch, you have to add new IPA dictionary entries for those filler words, and better to have paired filler speech as well.
So far as I know, we haven't added support for disfluencies synthesis yet, unless the training corpus has filler speech/text pairs and corresponding phonemes.
If you have filler speech/text pairs, and can't figure out canonical phonemes for filler words, you may try to use grapheme-based tokenizer for FastPitch.
Hi, thanks for the detailed response. It's really interesting and useful to consider support for disfluency synthesis. I hope this feature will be considered for future training versions.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
I am working on a project that involves creating dialogues that sound spontaneous and natural. A key feature I would like to implement is the use of disfluencies (such as “Mm-hmm”) during moments when calculations are being made or there are pauses in the dialogue.
So far, I have experimented with using phonemes to create these disfluencies, but the results have not been satisfactory. It's possible that I haven't found the correct IPA pronunciation for these expressions. (If you have any IPA pronunciation suggestions to try, they would be greatly appreciated).
My questions are as follows: