Open ttt733 opened 3 years ago
I cannot get the LibriTTS2K model to work with inference either, actually. I do not think the model's inheriting the weights properly, as it seems to be generating only random noise. If you see anything I'm doing wrong, let me know - if I can get it working, I'll put in a PR to update the readme instructions. config.json
{
"train_config": {
"output_directory": "/outdir",
"epochs": 10000000,
"optim_algo": "RAdam",
"learning_rate": 1e-3,
"weight_decay": 1e-6,
"grad_clip_val": 1,
"sigma": 1.0,
"iters_per_checkpoint": 1000,
"batch_size": 1,
"seed": 1234,
"checkpoint_path": "models/flowtron_libritts2p3k.pt",
"ignore_layers": [],
"finetune_layers": [],
"include_layers": ["speaker", "encoder", "embedding"],
"warmstart_checkpoint_path": "",
"with_tensorboard": true,
"fp16_run": true,
"gate_loss": true,
"use_ctc_loss": true,
"ctc_loss_weight": 0.01,
"blank_logprob": -8,
"ctc_loss_start_iter": 10000
},
"data_config": {
"training_files": "filelists/libritts_train_clean_100_audiopath_text_sid_shorterthan10s_atleast5min_train_filelist.txt",
"validation_files": "filelists/libritts_train_clean_100_audiopath_text_sid_atleast5min_val_filelist.txt",
"text_cleaners": ["flowtron_cleaners"],
"p_arpabet": 0.5,
"cmudict_path": "data/cmudict_dictionary",
"sampling_rate": 22050,
"filter_length": 1024,
"hop_length": 256,
"win_length": 1024,
"mel_fmin": 0.0,
"mel_fmax": 8000.0,
"max_wav_value": 32768.0,
"use_attn_prior": true,
"attn_prior_threshold": 0.0,
"prior_cache_path": "/attention_prior_cache",
"betab_scaling_factor": 1.0,
"keep_ambiguous": false
},
"dist_config": {
"dist_backend": "nccl",
"dist_url": "tcp://localhost:54321"
},
"model_config": {
"n_speakers": 123,
"n_speaker_dim": 128,
"n_text": 185,
"n_text_dim": 512,
"n_flows": 2,
"n_mel_channels": 80,
"n_attn_channels": 640,
"n_hidden": 1024,
"n_lstm_layers": 2,
"mel_encoder_n_hidden": 512,
"n_components": 0,
"mean_scale": 0.0,
"fixed_gaussian": true,
"dummy_speaker_embedding": false,
"use_gate_layer": true,
"use_cumm_attention": false
}
}
Command:
python inference.py -o ./outdir -c config.json -f models/flowtron_libritts2p3k.pt -w models/waveglow_256channels_universal_v5.pt -t "It is well known that deep generative models have a rich latent space!" -i 1088
Output:
Plus a 410 kb wav file of static. The waveglow model (v5) is the one linked in that repo's readme. And since it was mentioned in #74, my torch version is torch==1.8.1+cu111
, though I wasn't sure what exactly was meant by "try inference in fp32."
waveglow_256channels_universal_v5.pt gives me nothing but noise as well. I could not figure out what was happening for a long time and then I switched to v4 and everything worked.
@ttt733 are you able to produce spectrograms with the pre-trained model?
No. I'm attempting to use the LibriTTS2k linked in the repo, and I've tried with waveglow v5 and v4 without success. In my latest attempt I'm also getting an error from pytorch:
~/dev/flowtron$ python inference.py -o ./outdir -c config.json -f models/flowtron_libritts2p3k.pt -w models/waveglow_256channels_universal_v4.pt -t "It is well known that deep generative models have a rich latent space!" -i 1088
/home/trevor/anaconda3/envs/blitz/lib/python3.8/site-packages/torch/serialization.py:671: SourceChangeWarning: source code of class 'torch.nn.modules.conv.ConvTranspose1d' has changed. Saved a reverse patch to ConvTranspose1d.patch. Run `patch -p0 < ConvTranspose1d.patch` to revert your changes.
warnings.warn(msg, SourceChangeWarning)
/home/trevor/anaconda3/envs/blitz/lib/python3.8/site-packages/torch/serialization.py:671: SourceChangeWarning: source code of class 'torch.nn.modules.container.ModuleList' has changed. Tried to save a patch, but couldn't create a writable file ModuleList.patch. Make sure it doesn't exist and your working directory is writable.
warnings.warn(msg, SourceChangeWarning)
/home/trevor/anaconda3/envs/blitz/lib/python3.8/site-packages/torch/serialization.py:671: SourceChangeWarning: source code of class 'torch.nn.modules.conv.Conv1d' has changed. Saved a reverse patch to Conv1d.patch. Run `patch -p0 < Conv1d.patch` to revert your changes.
warnings.warn(msg, SourceChangeWarning)
The result is the same as what I posted above. Pytorch version is still 1.10.0.dev20210609
.
It works with waveglow_256channels_ljs_v3.pt
to download :
curl -LO 'https://api.ngc.nvidia.com/v2/models/nvidia/waveglow_ljs_256channels/versions/3/files/waveglow_256channels_ljs_v3.pt'
Unless I'm missing something, the fine-tuning instructions in the readme do not work. In train.py:
Hacking around the missing iteration value with
iteration = 1
has been mentioned in previous issues, and the optimizer can be skipped over by putting a dummy value into ignore_layers, but it seems like making the published model fit the code would be ideal.