Lightning-Universe / lightning-flash

Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains
https://lightning-flash.readthedocs.io
Apache License 2.0
1.74k stars 213 forks source link

TTS speech Generation #1113

Open flozi00 opened 2 years ago

flozi00 commented 2 years ago

🚀 Feature

Motivation

Pitch

Alternatives

One of very good providers is coqiu I think

Additional context

ethanwharris commented 2 years ago

Hey @flozi00 thanks for the suggestion! Would you be interested in trying to contribute this task to Flash? We can help you out if there's anything you need :smiley:

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

joowon-dm-snu commented 2 years ago

hello @ethanwharris, Do you think TTS can be implemented soon?

uakarsh commented 2 years ago

Hi @ethanwharris, I am currently interested in implementing Deep Learning Models (especially multi-modal transformers, my recent works are here). So, would you mind, if I can take a look on the topic of TTS Speech Generation and models (since I want to explore a new domain and Speech Recognition would be amazing to explore with) and get back to you?

krshrimali commented 2 years ago

Hi @ethanwharris, I am currently interested in implementing Deep Learning Models (especially multi-modal transformers, my recent works are here). So, would you mind, if I can take a look on the topic of TTS Speech Generation and models (since I want to explore a new domain and Speech Recognition would be amazing to explore with) and get back to you?

Hi, @uakarsh - great hearing from you! Thank you for showing interest. The team is on a company holiday for this week, so we are sorry if we were slow to respond but please go ahead and explore this issue. More than happy to see where this goes, I've assigned this issue to you. Please reach out in case you need any help. :)

uakarsh commented 2 years ago

Awesome then, looking forward to contributing something amazing to Flash

uakarsh commented 2 years ago

Hi @ethanwharris @krshrimali, I have been exploring Audio Processing and TTS from past few days. I think, there are a few things like: audio Transformations (similar to torchvision.transforms), different models.

How about, if we integrate this to Flash? It would help in loading any type of Audio Dataset with Text and train it/fine-tune it. (I guess, I have to write it entirely again, in order to allow the users to properly use it)

Although, there are a many types of models out there, but this was an end-to-end model, so definitely thought of pitching it. In coming time, we can try to add more models (not sure about how to integrate Hugging Face models, since I was not able to find proper scripts to train, but would search more).

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.