Build a non-end-to-end Text-to-Speech model

PrasannaKasar commented 4 weeks ago

Problem statement Previous TTS models often produced robotic-sounding speech, mispronounced words, lacked emotional nuance, struggled with contextual understanding, offered limited language support, and provided little customization for voice characteristics.

Solution: A non-end-to-end TTS model with Human-like speech A non-end-to-end TTS model with Transformers using separate components for text processing, phoneme prediction, and waveform generation to enhance pronunciation, prosody, and customization.

Alternatives Alternatives for TTS models include non-end-to-end options like Tacotron 2 and Deep Voice, and end-to-end models such as FastSpeech and WaveRNN. But they are usually computationally expensive and difficult to train.

Additional context Transformer TTS models utilize the self-attention mechanisms of transformers, originally designed for NLP, to improve text-to-speech synthesis by capturing long-range dependencies in text. This architecture enables more natural and expressive speech by decoupling components like text encoding, phoneme prediction, and waveform generation, allowing for greater control over pronunciation and emotional tone. Additionally, attention mechanisms help align text input with audio features, enhancing the accuracy of speech generation

github-actions[bot] commented 4 weeks ago

Thanks for creating the issue in ML-Nexus!🎉 Before you start working on your PR, please make sure to:

⭐ Star the repository if you haven't already.
Pull the latest changes to avoid any merge conflicts.
Attach before & after screenshots in your PR for clarity.
- Include the issue number in your PR description for better tracking. Don't forget to follow @UppuluriKalyani – Project Admin – for more updates! Tag @Neilblaze,@SaiNivedh26 for assigning the issue to you. Happy open-source contributing!☺️

github-actions[bot] commented 4 weeks ago

Thanks for raising this issue! However, we believe a similar issue already exists. Kindly go through all the open issues and ask to be assigned to that issue.

UppuluriKalyani / ML-Nexus

Build a non-end-to-end Text-to-Speech model #578