UppuluriKalyani / ML-Nexus

ML Nexus is an open-source collection of machine learning projects, covering topics like neural networks, computer vision, and NLP. Whether you're a beginner or expert, contribute, collaborate, and grow together in the world of AI. Join us to shape the future of machine learning!
https://ml-nexus.vercel.app/
MIT License
69 stars 123 forks source link

Build a non-end-to-end Text-to-Speech model #578

Closed PrasannaKasar closed 4 weeks ago

PrasannaKasar commented 4 weeks ago

Problem statement Previous TTS models often produced robotic-sounding speech, mispronounced words, lacked emotional nuance, struggled with contextual understanding, offered limited language support, and provided little customization for voice characteristics.

Solution: A non-end-to-end TTS model with Human-like speech A non-end-to-end TTS model with Transformers using separate components for text processing, phoneme prediction, and waveform generation to enhance pronunciation, prosody, and customization.

Alternatives Alternatives for TTS models include non-end-to-end options like Tacotron 2 and Deep Voice, and end-to-end models such as FastSpeech and WaveRNN. But they are usually computationally expensive and difficult to train.

Additional context Transformer TTS models utilize the self-attention mechanisms of transformers, originally designed for NLP, to improve text-to-speech synthesis by capturing long-range dependencies in text. This architecture enables more natural and expressive speech by decoupling components like text encoding, phoneme prediction, and waveform generation, allowing for greater control over pronunciation and emotional tone. Additionally, attention mechanisms help align text input with audio features, enhancing the accuracy of speech generation

github-actions[bot] commented 4 weeks ago

Thanks for creating the issue in ML-Nexus!🎉 Before you start working on your PR, please make sure to:

github-actions[bot] commented 4 weeks ago

Thanks for raising this issue! However, we believe a similar issue already exists. Kindly go through all the open issues and ask to be assigned to that issue.