BenAAndrew / Voice-Cloning-App

A Python/Pytorch app for easily synthesising human voices
BSD 3-Clause "New" or "Revised" License
1.4k stars 233 forks source link

Language Support #20

Closed ChrisDelClea closed 3 years ago

ChrisDelClea commented 3 years ago

Hey Folks,

I pretty much like your GUI and ease of use. Do you know or can estimate when you will support German?

Support for Linux would a nice feature too.

Best regards Chris

BenAAndrew commented 3 years ago

Hi @ChrisChross, I would like to support German at some point but it may be a while. It would be good if someone else could get involved and give it a go.

Linux is already supported

ChrisDelClea commented 3 years ago

Hi @BenAAndrew ,

i tired the UI and it works quite well. Especially the process from dataset creation, over training to syns is good. What I was not able to do was to train my own model as have a Mac and AMD GPU support is missing. That's kinda petty as I could not do it on another machine, but hope the support is coming soon.

As far as I understood, the waveglow model is language independent right? I am no expert though. Can you give me some hints, what changes are necessary to create a German voice cloning file and how large would the dataset have to be?
I saw there is one slot for a text file and one for audio. In my case I have several of both. Hope this is some valuable feedback for you too.

Best regards Chris

BenAAndrew commented 3 years ago

Hi @ChrisChross,

Firstly, AMD support is not available as PyTorch does not currently support it (although it's in development).

Secondly, to add support for other languages, a few things will be necessary:

  1. The transcription process (in https://github.com/BenAAndrew/Voice-Cloning-App/blob/main/dataset/transcribe.py) will need to be able to take the language as an argument, and download the relevant model
  2. Places where the metadata.csv file is loaded (such as https://github.com/BenAAndrew/Voice-Cloning-App/blob/main/training/train.py) will need to support larger character sets (currently most places may only support UTF-8)
  3. The symbols list (in https://github.com/BenAAndrew/Voice-Cloning-App/blob/main/training/train.py) will need to allow for different character sets

As you pointed out, the Waveglow model is independent so may need a pretrained model for your language from somewhere else. Other than that I don't believe anything else will differ.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.