Closed xzVice closed 3 years ago
Please start by reading my advice on training. This contains the link to training documentation: https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/431#issuecomment-673555684
If I were doing this, I would reuse the encoder and vocoder models. For the synthesizer, you have the option of training from scratch or finetuning the English model. Training from scratch should give better pronunciation and prosody. Finetuning will reduce training time and possibly have better voice similarity. If you finetune, modify the text cleaner to remove diacritics from vowels (change à to a, è and é to e, etc.). This is necessary since the English synthesizer does not include these characters in symbols.py.
Please start by reading my advice on training. This contains the link to training documentation: #431 (comment)
If I were doing this, I would reuse the encoder and vocoder models. For the synthesizer, you have the option of training from scratch or finetuning the English model. Training from scratch should give better pronunciation and prosody. Finetuning will reduce training time and possibly have better voice similarity. If you finetune, modify the text cleaner to remove diacritics from vowels (change à to a, è and é to e, etc.). This is necessary since the English synthesizer does not include these characters in symbols.py.
So, I tried doing what you told me to do and everything was doing well until the synthesizer_train.py command... Here is the execution of all the commands contained there https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training (till the train one ofc, which thrown the error) Any idea? 🤔
I also noticed those weird symbols inside the SV2TTS/synthesizer/train.txt file... Is it normal? I tried to edit the symbols.py/cleaners.py files but doing that didn't fix it... but anyways this is probably not what's causing the crash of the train command...
C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>py -3.6 synthesizer_preprocess_audio.py datasets_root --datasets_name LibriTTS --subfolders testing --no_alignments
Arguments:
datasets_root: datasets_root
out_dir: datasets_root\SV2TTS\synthesizer
n_processes: None
skip_existing: False
hparams:
no_alignments: True
datasets_name: LibriTTS
subfolders: testing
Using data from:
datasets_root\LibriTTS\testing
LibriTTS: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:09<00:00, 9.52s/speakers]
The dataset consists of 9 utterances, 7450 mel frames, 1488960 audio timesteps (0.03 hours).
Max input length (text chars): 140
Max mel frames length: 889
Max audio timesteps length: 177600
C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>python synthesizer_preprocess_embeds.py datasets_root/SV2TTS/synthesizer
Arguments:
synthesizer_root: datasets_root\SV2TTS\synthesizer
encoder_model_fpath: encoder\saved_models\pretrained.pt
n_processes: 4
Embedding: 0%| | 0/9 [00:00<?, ?utterances/s]Loaded encoder "pretrained.pt" trained to step 1564501
Loaded encoder "pretrained.pt" trained to step 1564501
Loaded encoder "pretrained.pt" trained to step 1564501
Loaded encoder "pretrained.pt" trained to step 1564501
Embedding: 100%|█████████████████████████████████████████████████████████████████| 9/9 [00:05<00:00, 1.73utterances/s]
C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>python synthesizer_train.py testing datasets_root/SV2TTS/synthesizer
Arguments:
run_id: testing
syn_dir: datasets_root/SV2TTS/synthesizer
models_dir: synthesizer/saved_models/
save_every: 1000
backup_every: 25000
force_restart: False
hparams:
Checkpoint path: synthesizer\saved_models\testing\testing.pt
Loading training data from: datasets_root\SV2TTS\synthesizer\train.txt
Using model: Tacotron
Using device: cpu
Initialising Tacotron Model...
Trainable Parameters: 30.876M
Starting the training of Tacotron from scratch
Using inputs from:
datasets_root\SV2TTS\synthesizer\train.txt
datasets_root\SV2TTS\synthesizer\mels
datasets_root\SV2TTS\synthesizer\embeds
Found 9 samples
+----------------+------------+---------------+------------------+
| Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) |
+----------------+------------+---------------+------------------+
| 20k Steps | 12 | 0.001 | 2 |
+----------------+------------+---------------+------------------+
Traceback (most recent call last):
File "synthesizer_train.py", line 35, in <module>
train(**vars(args))
File "C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning\synthesizer\train.py", line 158, in train
for i, (texts, mels, embeds, idx) in enumerate(data_loader, 1):
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 355, in __iter__
return self._get_iterator()
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\dataloader.py", line 914, in __init__
w.start()
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'train.<locals>.<lambda>'
C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
I don't have time to fully troubleshoot issues, but this may help. If not, you'll need to figure it out yourself.
Problem may be coming from this line, which reads the transcripts: https://github.com/CorentinJ/Real-Time-Voice-Cloning/blob/b5ba6d0371882dbab595c48deb2ff17896547de7/synthesizer/preprocess.py#L77
Try adding utf-8 file encoding.
with text_fpath.open("r", encoding="utf-8") as text_file:
For a soluton to:
AttributeError: Can't pickle local object 'train.<locals>.<lambda>'
EOFError: Ran out of input
Please see https://github.com/CorentinJ/Real-Time-Voice-Cloning/issues/669#issuecomment-781130738 for a workaround. We set num_workers=0 on Windows.
Thanks! Now both errors got solved... but it's really slow (the 20000 steps train command)... also idk why it says Using device: cpu
even tho I installed the latest cuda toolkit and I got a GTX 1050 Ti...
Nevermind I had the cpu version of pytorch installed...
Let's suppose I got the Italian dataset from here (ASR one, flac) http://www.openslr.org/94/ How am I supposed to create all the pretrained models from it (the .pt files, for vocoder, synthesizer and encoder)?
HI, can you release the Italian models you trained? How do I set it up? I want to clone voices in this language.
@arianaglande hello, i am looking for italian models. let me know if i can help to train the model. i have a rtx2070 gpu.
@arianaglande I'm also looking for it. If you managed to do that, it would be very helpful sharing that with us. Thanks
I'm trying to do the same and as @blue-fish said (if I got it correct) I just need to train the synthesizer so I have to skip the first steps in https://github.com/CorentinJ/Real-Time-Voice-Cloning/wiki/Training#datasets until I reach the: "Begin with the audios and the mel spectrograms:_python synthesizer_preprocess_audio.py
@arianaglande Hi, how did you manage to preprocess the italian dataset into the format the scripts accept?
can please someone upload the pretrained models?
Let's suppose I got the Italian dataset from here (ASR one, flac) http://www.openslr.org/94/ How am I supposed to create all the pretrained models from it (the .pt files, for vocoder, synthesizer and encoder)?