Making a model for the Russian language

ks-sav commented 3 years ago

I proceed to creating a model for the Russian language

I made verify that the code works on my platform, using the LibriSpeech train-clean-100
I prepared 380 hours of Russian speech (1270 speakers) into this format: #437 (comment)

Now I need your advice

Do I need to add English to my dataset?
Can I re-train model to an existing synthesizer model, or is it better to train my own
Do I have any chance of doing this on CPU

ghost commented 3 years ago

Advice

I do not recommend adding English, but it is something you can try if you need a model that works for both languages.
Train a new synthesizer model. Don't forget to edit synthesizer/utils/symbols.py to include all the letters of the Russian alphabet. Here is a good start for Russian: symbols.py
Realistically, CPU is too slow. The model needs to learn attention before inference will work. This usually requires 10,000 to 20,000 steps. The training speed on CPU is anywhere from 1 to 4 steps per minute. So you will be waiting 1 to 2 weeks until you know whether your settings are correct. Even after attention is learned, you will be waiting another month or longer to train the 100,000 to 200,000 steps that it takes for the model to become usable.

If you do not have access to a GPU, try to set up this repo, which has a Russian pretrained model. Note: It uses tensorflow and you will need to apply the synthesizer changes in #366 to make it work on CPU. https://github.com/vlomme/Multi-Tacotron-Voice-Cloning

ks-sav commented 3 years ago

What can be related to

RuntimeWarning: invalid value encountered in true_divide
wav = wav / np. abs(wav). max () * params.rescaling_max

when running the script synthesizer_preprocess_audio.py ?

ks-sav commented 3 years ago

Also, I want to share for those who have similar problems:

Can't pickle <class 'Memory Error'>: it's not the same object as builtins. MemoryError when running synthesizer_preprocess_audio.py is solved using --n_processes 1
Instead of the webrtcvad library, I use webrtcvad-wheels on Windows10

ghost commented 3 years ago

RuntimeWarning: invalid value encountered in true_divide

Check for audio files that are completely silent.

RAVANv2 commented 3 years ago

@blue-fish I would like to retrain all models. Is there any problem if I use google colab GPU for training purpose. Is it sufficient for training?

ghost commented 3 years ago

@RAVANv2 We do not provide support for colab, it can be done but you'll have to figure it out on your own.

fancat-programer commented 3 years ago

Advice

* I do not recommend adding English, but it is something you can try if you need a model that works for both languages.

* Train a new synthesizer model. Don't forget to edit `synthesizer/utils/symbols.py` to include all the letters of the Russian alphabet. Here is a good start for Russian: [`symbols.py`](https://github.com/vlomme/Multi-Tacotron-Voice-Cloning/blob/master/synthesizer/utils/symbols.py)

* Realistically, CPU is too slow. The model needs to learn attention before inference will work. This usually requires 10,000 to 20,000 steps. The training speed on CPU is anywhere from 1 to 4 steps per **minute**. So you will be waiting 1 to 2 weeks until you know whether your settings are correct. Even after attention is learned, you will be waiting another month or longer to train the 100,000 to 200,000 steps that it takes for the model to become usable.

If you do not have access to a GPU, try to set up this repo, which has a Russian pretrained model. Note: It uses tensorflow and you will need to apply the synthesizer changes in #366 to make it work on CPU. https://github.com/vlomme/Multi-Tacotron-Voice-Cloning

This model is too awful.

neonsecret commented 2 years ago

see my fork https://github.com/neonsecret/Real-Time-Voice-Cloning-Multilang it is adjusted to train the bilingual ru+en model and is easily adjustable for adding new languages

vorob1 commented 2 years ago

see my fork https://github.com/neonsecret/Real-Time-Voice-Cloning-Multilang it is adjusted to train the bilingual ru+en model and is easily adjustable for adding new languages

Sir, that's exactly what i'm looking for. I wanna correct some wrong voiceover in old game, but since i can't get in touch with actor i want to simulate his voice.

The subj tool works, but can't do russian voice https://youtu.be/lDbpoaaBJSo Your fork gives me errors:

PS C:\Users\babud\Downloads\Real-Time-Voice-Cloning-Multilang-master> python demo_toolbox.py
Traceback (most recent call last):
  File "demo_toolbox.py", line 7, in <module>
    from utils.default_models import ensure_default_models
ModuleNotFoundError: No module named 'utils.default_models'

My knowledge on all these python stuff is low so i just copy paste commands, sometimes try to understand its errors, but this looks unsolvable with my level of knowledge.

I want simple thing, launch GUI, point program to WAV files with actor voice, enter text and get voiceover files :)

I also tried python demo_cli.py, got lot's of stuff but in the end it was this:

FileNotFoundError: [Errno 2] No such file or directory: 'saved_models\\rusmodeltweaked\\synthesizer.pt'

vorob1 commented 2 years ago

Okay i managed to turn on toolbox by copying some files from original build, now when i add wav and try synth +vocode i get this error:

size mismatch for encoder.embeddingweight: copying a param with shape torch.5ize([66, 512]) from chequoint, the shape in current model is tord1.Size([194, 512]).

CorentinJ / Real-Time-Voice-Cloning

Making a model for the Russian language #707