coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.48k stars 4.33k forks source link

[Bug] Not able to train glowtts #1447

Closed Arjunprasaath closed 2 years ago

Arjunprasaath commented 2 years ago

I wanted to train a glow tts model with custom dataset which is structured similar to the LJSpeech dataset. While running training i get the following error attached below, please do guide and help me on whats being done wrong. The updated version of the train_glowtts.py file is shared below.

train_glowtts.py:

import os

from trainer import Trainer, TrainerArgs

from TTS.tts.configs.glow_tts_config import GlowTTSConfig

from TTS.tts.configs.shared_configs import BaseDatasetConfig
from TTS.tts.datasets import load_tts_samples
from TTS.tts.models.glow_tts import GlowTTS
from TTS.tts.utils.text.tokenizer import TTSTokenizer
from TTS.utils.audio import AudioProcessor

output_path = os.path.dirname("/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/output")

dataset_config = BaseDatasetConfig(
    name="ljspeech", meta_file_train="metadata.txt", path="/Users/arjunsmac/Documents/CODING/python_code/Final year project/dataset16k/"
)

config = GlowTTSConfig(
    batch_size=32,
    eval_batch_size=16,
    num_loader_workers=0,
    num_eval_loader_workers=0,
    run_eval=False,
    test_delay_epochs=-1,
    epochs=10,
    text_cleaner="phoneme_cleaners",
    use_phonemes=True,
    phoneme_language="en-us",
    phoneme_cache_path=os.path.join(output_path, "phoneme_cache"),
    print_step=2,
    print_eval=False,
    mixed_precision=True,
    output_path=output_path,
    datasets=[dataset_config],
)

ap = AudioProcessor.init_from_config(config)

tokenizer, config = TTSTokenizer.init_from_config(config)

train_samples, eval_samples = load_tts_samples(dataset_config, eval_split=False)

model = GlowTTS(config, ap, tokenizer, speaker_manager=None)

trainer = Trainer(
    TrainerArgs(), config, output_path, model=model, train_samples=train_samples)

trainer.fit()

ERROR :

 > Setting up Audio Processor...
 | > sample_rate:16000
 | > resample:False
 | > num_mels:80
 | > log_func:np.log10
 | > min_level_db:-100
 | > frame_shift_ms:None
 | > frame_length_ms:None
 | > ref_level_db:20
 | > fft_size:1024
 | > power:1.5
 | > preemphasis:0.0
 | > griffin_lim_iters:60
 | > signal_norm:True
 | > symmetric_norm:True
 | > mel_fmin:0
 | > mel_fmax:None
 | > pitch_fmin:0.0
 | > pitch_fmax:640.0
 | > spec_gain:20.0
 | > stft_pad_mode:reflect
 | > max_norm:4.0
 | > clip_norm:True
 | > do_trim_silence:True
 | > trim_db:45
 | > do_sound_norm:False
 | > do_amp_to_db_linear:True
 | > do_amp_to_db_mel:True
 | > do_rms_norm:False
 | > db_level:None
 | > stats_path:None
 | > base:10
 | > hop_length:256
 | > win_length:1024
 | > Found 32 files in /Users/arjunsmac/Documents/CODING/python_code/Final year project/dataset16k
fatal: not a git repository (or any of the parent directories): .git
fatal: not a git repository (or any of the parent directories): .git
 > Using CUDA: False
 > Number of GPUs: 0

 > Model has 28610257 parameters

 > EPOCH: 0/10
 --> /Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/run-March-25-2022_11+36AM-0000000

> DataLoader initialization
| > Tokenizer:
    | > add_blank: False
    | > use_eos_bos: False
    | > use_phonemes: True
    | > phonemizer:
        | > phoneme language: en-us
        | > phoneme backend: gruut
| > Number of instances : 32
 | > Preprocessing samples
 | > Max text length: 47
 | > Min text length: 8
 | > Avg text length: 29.5625
 | 
 | > Max audio length: 118950.0
 | > Min audio length: 16550.0
 | > Avg audio length: 55888.0
 | > Num. instances discarded samples: 0
 | > Batch group size: 0.

 > TRAINING (2022-03-25 11:36:04) 
/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/venv/lib/python3.7/site-packages/torch/autocast_mode.py:162: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/venv/lib/python3.7/site-packages/TTS/tts/models/glow_tts.py:517: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  y_lengths = (y_lengths // self.num_squeeze) * self.num_squeeze

   --> STEP: 0/1 -- GLOBAL_STEP: 0
     | > current_lr: 0.00000 
     | > step_time: 4.80510  (4.80505)
     | > loader_time: 0.17340  (0.17338)

/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/venv/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
 | > Synthesizing test sentences.

  --> EVAL PERFORMANCE
     | > avg_loader_time: 0.17338 (+0.00000)
     | > avg_step_time: 4.80505 (+0.00000)

 ! Run is removed from /Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/run-March-25-2022_11+36AM-0000000
Traceback (most recent call last):
  File "/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/venv/lib/python3.7/site-packages/trainer/trainer.py", line 1461, in fit
    self._fit()
  File "/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/venv/lib/python3.7/site-packages/trainer/trainer.py", line 1455, in _fit
    self.save_best_model()
  File "/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/venv/lib/python3.7/site-packages/trainer/trainer.py", line 1488, in save_best_model
    target_loss_dict = self._pick_target_avg_loss(self.keep_avg_eval if self.keep_avg_eval else self.keep_avg_train)
  File "/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/venv/lib/python3.7/site-packages/trainer/trainer.py", line 1654, in _pick_target_avg_loss
    target_avg_loss = keep_avg_target["avg_loss"]
  File "/Users/arjunsmac/Documents/CODING/python_code/Final year project/Python/venv/lib/python3.7/site-packages/trainer/generic_utils.py", line 98, in __getitem__
    return self.avg_values[key]
KeyError: 'avg_loss'
WeberJulian commented 2 years ago

Hey, the problem probably comes from the fact that you have only 32 samples. It's not enough to train a TTS model.

erogol commented 2 years ago

I think there are no eval samples. But the trainer should have raised a warning or error.

WeberJulian commented 2 years ago

@Arjunprasaath, you can set run_eval = False

Arjunprasaath commented 2 years ago

@WeberJulian It is set to false. @erogol I used such a small dataset to check if it atleast work so i could use a bigger one.

erogol commented 2 years ago

I don't think it is set false or set false correctly as the trainer runs eval steps anyhow.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.