Help making Italian Vocoder/Synthesizer #697

xzVice commented 3 years ago

Let's suppose I got the Italian dataset from here (ASR one, flac) How am I supposed to create all the pretrained models from it (the .pt files, for vocoder, synthesizer and encoder)?

ghost commented 3 years ago

Please start by reading my advice on training. This contains the link to training documentation:

If I were doing this, I would reuse the encoder and vocoder models. For the synthesizer, you have the option of training from scratch or finetuning the English model. Training from scratch should give better pronunciation and prosody. Finetuning will reduce training time and possibly have better voice similarity. If you finetune, modify the text cleaner to remove diacritics from vowels (change à to a, è and é to e, etc.). This is necessary since the English synthesizer does not include these characters in

xzVice commented 3 years ago

Please start by reading my advice on training. This contains the link to training documentation: #431 (comment)

If I were doing this, I would reuse the encoder and vocoder models. For the synthesizer, you have the option of training from scratch or finetuning the English model. Training from scratch should give better pronunciation and prosody. Finetuning will reduce training time and possibly have better voice similarity. If you finetune, modify the text cleaner to remove diacritics from vowels (change à to a, è and é to e, etc.). This is necessary since the English synthesizer does not include these characters in

So, I tried doing what you told me to do and everything was doing well until the command... Here is the execution of all the commands contained there (till the train one ofc, which thrown the error) Any idea? 🤔

I also noticed those weird symbols inside the SV2TTS/synthesizer/train.txt file... image Is it normal? I tried to edit the files but doing that didn't fix it... but anyways this is probably not what's causing the crash of the train command...

C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>py -3.6 datasets_root --datasets_name LibriTTS --subfolders testing --no_alignments
    datasets_root:   datasets_root
    out_dir:         datasets_root\SV2TTS\synthesizer
    n_processes:     None
    skip_existing:   False
    no_alignments:   True
    datasets_name:   LibriTTS
    subfolders:      testing

Using data from:
LibriTTS: 100%|████████████████████████████████████████████████████████████████████| 1/1 [00:09<00:00,  9.52s/speakers]
The dataset consists of 9 utterances, 7450 mel frames, 1488960 audio timesteps (0.03 hours).
Max input length (text chars): 140
Max mel frames length: 889
Max audio timesteps length: 177600

C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>python datasets_root/SV2TTS/synthesizer
    synthesizer_root:      datasets_root\SV2TTS\synthesizer
    encoder_model_fpath:   encoder\saved_models\
    n_processes:           4

Embedding:   0%|                                                                         | 0/9 [00:00<?, ?utterances/s]Loaded encoder "" trained to step 1564501
Embedding: 100%|█████████████████████████████████████████████████████████████████| 9/9 [00:05<00:00,  1.73utterances/s]

C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>python testing datasets_root/SV2TTS/synthesizer
    run_id:          testing
    syn_dir:         datasets_root/SV2TTS/synthesizer
    models_dir:      synthesizer/saved_models/
    save_every:      1000
    backup_every:    25000
    force_restart:   False

Checkpoint path: synthesizer\saved_models\testing\
Loading training data from: datasets_root\SV2TTS\synthesizer\train.txt
Using model: Tacotron
Using device: cpu

Initialising Tacotron Model...

Trainable Parameters: 30.876M

Starting the training of Tacotron from scratch

Using inputs from:
Found 9 samples
| Steps with r=2 | Batch Size | Learning Rate | Outputs/Step (r) |
|   20k Steps    |     12     |     0.001     |        2         |

Traceback (most recent call last):
  File "", line 35, in <module>
  File "C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning\synthesizer\", line 158, in train
    for i, (texts, mels, embeds, idx) in enumerate(data_loader, 1):
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\", line 355, in __iter__
    return self._get_iterator()
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\", line 301, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\utils\data\", line 914, in __init__
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\", line 65, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'train.<locals>.<lambda>'

C:\Users\Workspace\Desktop\Real-Time-Voice-Cloning>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\Workspace\AppData\Local\Programs\Python\Python36\lib\multiprocessing\", line 115, in _main
    self = reduction.pickle.load(from_parent)
EOFError: Ran out of input
ghost commented 3 years ago

I don't have time to fully troubleshoot issues, but this may help. If not, you'll need to figure it out yourself.

Weird characters in train.txt

Problem may be coming from this line, which reads the transcripts:

Try adding utf-8 file encoding.

with"r", encoding="utf-8") as text_file:

Error running

For a soluton to:

AttributeError: Can't pickle local object 'train.<locals>.<lambda>'
EOFError: Ran out of input

Please see for a workaround. We set num_workers=0 on Windows.

xzVice commented 3 years ago

Thanks! Now both errors got solved... but it's really slow (the 20000 steps train command)... also idk why it says Using device: cpu even tho I installed the latest cuda toolkit and I got a GTX 1050 Ti...

xzVice commented 3 years ago

Nevermind I had the cpu version of pytorch installed...

AVTV64 commented 3 years ago

Let's suppose I got the Italian dataset from here (ASR one, flac) How am I supposed to create all the pretrained models from it (the .pt files, for vocoder, synthesizer and encoder)?

frossi65 commented 3 years ago

FedericoFedeFede commented 3 years ago

TalissaDreossi commented 3 years ago

I'm trying to do the same and as @blue-fish said (if I got it correct) I just need to train the synthesizer so I have to skip the first steps in until I reach the:
"Begin with the audios and the mel spectrograms:
_python ".
Is it right? If so, how have I to structure my dataset? I have downloaded the italian one from but I don't know if I have to preprocess it before running the instruction above (in other words I don't know what it is expected in
_) Thanks in advance

alessandrolamberti commented 2 years ago

Alex2610 commented 1 year ago

