Closed lihaoyangML closed 3 years ago
Hi Eren, I am encountering the same issue here as well. I have tried to train the Glow-TTS model with LJSpeech dataset with version 0.0.13, the avg_align_error
in the tensorboard output likewise stays constant 5k steps into the training.
I have also experimented with different encoder_type
, and the problem persists. All trainings are carried out with default configs from the current repo / config for released models.
residual_bn_conv
avg_align_error
: 0.05798
gated_conv
avg_align_error
: 0.1014
rel_pos_transformer
avg_align_error
: 0.1014
time_depth_separable
avg_align_error
: 0.1014
I do understand that this is an out-of-date version, but would still like to seek your kind advice and assistance if the issue could be resolved. Thank you in advance!
Why do you prefer to use the older version ?
Hi Eren, thanks for the reply. I am using the older version because I want to experiment fine tuning the released multi-speaker glow TTS models in the repo on my own multi-speaker dataset. However, I think the pretrained models, such as SC-GlowTTS-Trans are not compatible with the newer version.
nope they are all comptible.
Hi Eren, thank you for your reply! I switched over to the latest version earlier on (v0.1.2), and it seems like the issue persists.
Tensorboard output:
avg_align_error
stays constant at 0.09933.
Training script (identical to recipes/ljspeech/glow_tts/train_glowtts.py
):
import os
from TTS.tts.configs import GlowTTSConfig from TTS.tts.configs import BaseDatasetConfig from TTS.trainer import init_training, Trainer, TrainingArgs
output_path = "/home/david_yan/coqui_ai_tts/glow_tts_models" dataset_config = BaseDatasetConfig(name="ljspeech", meta_file_train="metadata.csv", path="/home/david_yan/mozilla_tts/TTS/tts/datasets/LJSpeech-1.1/") config = GlowTTSConfig( batch_size=32, eval_batch_size=16, num_loader_workers=4, num_eval_loader_workers=4, run_eval=True, test_delay_epochs=-1, epochs=1000, text_cleaner="english_cleaners", use_phonemes=False, phoneme_language="en-us", phoneme_cache_path=os.path.join(output_path, "phoneme_cache"), print_step=25, print_eval=True, mixed_precision=False, output_path=output_path, datasets=[dataset_config] ) args, config, outputpath, , c_logger, tb_logger = init_training(TrainingArgs(), config) trainer = Trainer(args, config, output_path, c_logger, tb_logger) trainer.fit()
BTW I just realized that the align_error does not really make sense with models using duration predictors since it actually measures how diagonal and decisive the alignment is but since the alignment is 0-1 valued for such models, it computes the same error for every step.
So the solution to this issue is to removing align_error from these models. But for now, you can just ignore it.
"I just realized that the align_error does not really make sense with models using duration predictors..." - Is this also true for Taco2 DCA? Because I'm experiencing the same issue where avg_align_error isn't changing at all efter each epoch.
This is a new dataset I'm using and I had some success with it using Nvidia Nemo but so far Coqui GlowTTS and Taco2 is just producing noise.
It'd just produce noise until the attention aligns.
Hi sir, I think issue 408 is still present in both v0.0.12 and v0.013. When I used glow_tts_ljspeech.json to run a training on LJSpeech, the avg_align_loss remained constant. Here is the Tensorboard output:
To Reproduce
Installation of the library
git clone https://github.com/coqui-ai/TTS.git git reset --hard f02f033 pip3 install -e . pip3 install packaging pip3 install numba==0.48 pip3 install umap-learn==0.4.6
updating the path information in glow_tts_ljspeech.json
{ "model": "glow_tts", "run_name": "glow-tts-residual_bn_conv", "run_description": "glow-tts model training with residual BN conv.",
Running Glow TTS training
CUDA_VISIBLE_DEVICES="1" python TTS/bin/train_glow_tts.py --config_path TTS/tts/configs/glow_tts_ljspeech.json
Stopped training after 4k steps, as avg_align_error did not change.
My environment:
I have tried both v0.0.12 and v0.0.13 and the issue persisted. Thank you and please let me know if you need more information from me.