Closed ks-sav closed 3 years ago
Advice
synthesizer/utils/symbols.py
to include all the letters of the Russian alphabet. Here is a good start for Russian: symbols.py
If you do not have access to a GPU, try to set up this repo, which has a Russian pretrained model. Note: It uses tensorflow and you will need to apply the synthesizer changes in #366 to make it work on CPU. https://github.com/vlomme/Multi-Tacotron-Voice-Cloning
What can be related to
RuntimeWarning: invalid value encountered in true_divide
wav = wav / np. abs(wav). max () * params.rescaling_max
when running the script synthesizer_preprocess_audio.py
?
Also, I want to share for those who have similar problems:
Can't pickle <class 'Memory Error'>: it's not the same object as builtins. MemoryError
when running synthesizer_preprocess_audio.py
is solved using --n_processes 1
webrtcvad
library, I use webrtcvad-wheels
on Windows10
RuntimeWarning: invalid value encountered in true_divide
Check for audio files that are completely silent.
@blue-fish I would like to retrain all models. Is there any problem if I use google colab GPU for training purpose. Is it sufficient for training?
@RAVANv2 We do not provide support for colab, it can be done but you'll have to figure it out on your own.
Advice
* I do not recommend adding English, but it is something you can try if you need a model that works for both languages. * Train a new synthesizer model. Don't forget to edit `synthesizer/utils/symbols.py` to include all the letters of the Russian alphabet. Here is a good start for Russian: [`symbols.py`](https://github.com/vlomme/Multi-Tacotron-Voice-Cloning/blob/master/synthesizer/utils/symbols.py) * Realistically, CPU is too slow. The model needs to learn attention before inference will work. This usually requires 10,000 to 20,000 steps. The training speed on CPU is anywhere from 1 to 4 steps per **minute**. So you will be waiting 1 to 2 weeks until you know whether your settings are correct. Even after attention is learned, you will be waiting another month or longer to train the 100,000 to 200,000 steps that it takes for the model to become usable.
If you do not have access to a GPU, try to set up this repo, which has a Russian pretrained model. Note: It uses tensorflow and you will need to apply the synthesizer changes in #366 to make it work on CPU. https://github.com/vlomme/Multi-Tacotron-Voice-Cloning
This model is too awful.
see my fork https://github.com/neonsecret/Real-Time-Voice-Cloning-Multilang it is adjusted to train the bilingual ru+en model and is easily adjustable for adding new languages
see my fork https://github.com/neonsecret/Real-Time-Voice-Cloning-Multilang it is adjusted to train the bilingual ru+en model and is easily adjustable for adding new languages
Sir, that's exactly what i'm looking for. I wanna correct some wrong voiceover in old game, but since i can't get in touch with actor i want to simulate his voice.
The subj tool works, but can't do russian voice https://youtu.be/lDbpoaaBJSo Your fork gives me errors:
PS C:\Users\babud\Downloads\Real-Time-Voice-Cloning-Multilang-master> python demo_toolbox.py
Traceback (most recent call last):
File "demo_toolbox.py", line 7, in <module>
from utils.default_models import ensure_default_models
ModuleNotFoundError: No module named 'utils.default_models'
My knowledge on all these python stuff is low so i just copy paste commands, sometimes try to understand its errors, but this looks unsolvable with my level of knowledge.
I want simple thing, launch GUI, point program to WAV files with actor voice, enter text and get voiceover files :)
I also tried python demo_cli.py, got lot's of stuff but in the end it was this:
FileNotFoundError: [Errno 2] No such file or directory: 'saved_models\\rusmodeltweaked\\synthesizer.pt'
Okay i managed to turn on toolbox by copying some files from original build, now when i add wav and try synth +vocode i get this error:
size mismatch for encoder.embeddingweight: copying a param with shape torch.5ize([66, 512]) from chequoint, the shape in current model is tord1.Size([194, 512]).
I proceed to creating a model for the Russian language
Now I need your advice