Open ilnmtlbnm opened 4 years ago
If you are using the LJS model that might be expected as it is a single speaker model. You could try using the LibrITTS.
Just a note - when using LibrITTS you will also have to change the n_speakers parameter in config.json to 123:
"model_config": { "n_speakers": 123, "n_speaker_dim": 128, "n_text": 185, "n_text_dim": 512, "n_flows": 2, "n_mel_channels": 80, "n_attn_channels": 640, "n_hidden": 1024, "n_lstm_layers": 2, "mel_encoder_n_hidden": 512, "n_components": 0, "mean_scale": 0.0, "fixed_gaussian": true, "dummy_speaker_embedding": false, "use_gate_layer": true }
If you are using the LJS model that might be expected as it is a single speaker model. You could try using the LibrITTS.
Of course, thanks @karkirowle !
And thanks @Quasimondo for precising n_speakers
for LibrITTS.
DOH! again, I closed to fast, still doesn't with LibrITTS.
python inference.py -c config.json -f models/flowtron_libritts.pt -w models/waveglow_256channels_v4.pt -t "But the machine only creates what humans have taught it to " -i 15 -n 777 -s 0.5
Yeah - I realized that you will also have to adjust the "data_config" section: "training_files": "filelists/libritts_train_clean_100_audiopath_text_sid_shorterthan10s_atleast5min_train_filelist.txt"
And lastly you will have to pick a speaker ID that actually exists. They are not numbered consecutively, but you have to look them up in that filelist (it's the numbers at the end of each line)
Thanks again @Quasimondo
For reference, here are the valid ids for LibriTTS :
40 78 83 87 118 125 196 200 250 254 374 405 446 460 587 669 696 730 831 887 1069 1088 1116 1246 1263
1502 1578 1841 1867 1963 1970 2092 2136 2182 2196 2289 2416 2436 2836 2843 2911 2952 3240 3242 3259
3436 3486 3526 3664 3857 3879 3982 3983 4018 4051 4088 4160 4195 4267 4297 4362 4397 4406 4640 4680
4788 5022 5104 5322 5339 5393 5652 5678 5703 5750 5808 6019 6064 6078 6081 6147 6181 6209 6272 6367
6385 6415 6437 6454 6476 6529 6818 6836 6848 7059 7067 7078 7178 7190 7226 7278 7302 7367 7402 7447
7505 7511 7794 7800 8051 8088 8098 8108 8123 8238 8312 8324 8419 8468 8609 8629 8770 8838
Thank you for compiling this list!
I add additional script extract available sid. See below
https://github.com/yhgon/flowtron/blob/master/inference_colab.ipynb
import os
import sys
import pandas as pd
import numpy as np
import random
from itertools import cycle
from data import load_filepaths_and_text
!cat /content/flowtron/filelists/libritts_speakerinfo.txt | tail -n +12 | head -n 10
filelist_path = "/content/flowtron/filelists/libritts_train_clean_100_audiopath_text_sid_shorterthan10s_atleast5min_train_filelist.txt"
def create_speaker_lookup_table(audiopaths_and_text):
speaker_ids = np.sort(np.unique([x[2] for x in audiopaths_and_text]))
d = {int(speaker_ids[i]): i for i in range(len(speaker_ids))}
print("Number of speakers :", len(d))
return d
audiopaths_and_text = load_filepaths_and_text(filelist_path)
speaker_ids = create_speaker_lookup_table(audiopaths_and_text).keys()
print(speaker_ids)
speakers = pd.read_csv('/content/flowtron/filelists/libritts_speakerinfo.txt', engine='python',header=None, comment=';', sep=' *\| *', names=['ID', 'SEX', 'SUBSET', 'MINUTES', 'NAME'])
speakers['FLOWTRON_ID'] = speakers['ID'].apply(lambda x: x if x in speaker_ids else -1)
female_speakers = speakers.query("SEX == 'F' and MINUTES > 20 and FLOWTRON_ID >= 0")['FLOWTRON_ID'].sample(frac=1).tolist()
male_speakers = speakers.query("SEX == 'M' and MINUTES > 20 and FLOWTRON_ID >= 0")['FLOWTRON_ID'].sample(frac=1).tolist()
print("females speakers : ", len(female_speakers), female_speakers )
print("male speakers : ", len(male_speakers), male_speakers )
There is a
Speaker id
argument ininference.py
:parser.add_argument('-i', '--id', help='Speaker id', type=int)
.Whenever I try to change it to something other than
0
, I get the following error :