Closed dothehansa closed 1 year ago
@WeberJulian can you check?
I've just tried running the same script in a kaggle notebook as a check & reached the same error, so could be independent of environment?
Yeah, that's a known issue, capacitron is pretty unstable to train. I see you changed the original recipe at least with the max_audio_len. Try larger batch size and different len boundries, you can also experiment with the ga_loss and others.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
Describe the bug
After training using the tacotron2_capacitron recipe I consistently run into this error:
ValueError: Expected parameter loc (Tensor of shape (64, 128)) of distribution MultivariateNormal(loc: torch.Size([64, 128]), covariance_matrix: torch.Size([64, 128, 128])) to satisfy the constraint IndependentConstraint(Real(), 1), but found invalid values: tensor([[nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], ..., [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan], [nan, nan, nan, ..., nan, nan, nan]], grad_fn=)
To Reproduce
running this command: CUDA_VISIBLE_DEVICES=0 python3 t2_capacitron.py (also does it with continue path: --continue_path ~/Capacitron-Tacotron2-February-25-2023_09+15AM-0000000) I can restart the training & it will continue for a while before crashing again.
Expected behavior
continue training run
Logs
Environment
Additional context
import os
from trainer import Trainer, TrainerArgs
from TTS.config.shared_configs import BaseAudioConfig from TTS.tts.configs.shared_configs import BaseDatasetConfig, CapacitronVAEConfig from TTS.tts.configs.tacotron2_config import Tacotron2Config from TTS.tts.datasets import load_tts_samples from TTS.tts.models.tacotron2 import Tacotron2 from TTS.tts.utils.text.tokenizer import TTSTokenizer from TTS.utils.audio import AudioProcessor
output_path = '~/TTS/dev' dataset_config = BaseDatasetConfig(formatter='ljspeech', meta_file_train="metadata.csv", path= output_path)
audio_config = BaseAudioConfig( sample_rate=22050, do_trim_silence=True, trim_db=60.0, signal_norm=False, mel_fmin=0.0, mel_fmax=11025, spec_gain=1.0, log_func="np.log", ref_level_db=20, preemphasis=0.0, )
Using the standard Capacitron config
capacitron_config = CapacitronVAEConfig(capacitron_VAE_loss_alpha=1.0, capacitron_capacity=50)
config = Tacotron2Config( run_name="Capacitron-Tacotron2", audio=audio_config, capacitron_vae=capacitron_config, use_capacitron_vae=True, batch_size=64, # Tune this to your gpu max_audio_len=6 22050, # Tune this to your gpu min_audio_len=0.5 22050, eval_batch_size=16, num_loader_workers=4, num_eval_loader_workers=4, precompute_num_workers=4, run_eval=True, test_delay_epochs=100, ga_alpha=0.0, r=2, optimizer="CapacitronOptimizer", optimizer_params={"RAdam": {"betas": [0.9, 0.998], "weight_decay": 1e-6}, "SGD": {"lr": 1e-5, "momentum": 0.9}}, attention_type="dynamic_convolution", grad_clip=0.0, # Important! We overwrite the standard grad_clip with capacitron_grad_clip double_decoder_consistency=False, epochs=1000, text_cleaner="phoneme_cleaners", use_phonemes=True, phoneme_language="en-us", phonemizer="espeak", phoneme_cache_path=os.path.join(output_path, "phoneme_cache"), stopnet_pos_weight=15, print_step=10, print_eval=True, mixed_precision=False, seq_len_norm=True, output_path=output_path, datasets=[dataset_config], lr=1e-3, lr_scheduler="StepwiseGradualLR", lr_scheduler_params={ "gradual_learning_rates": [ [0, 1e-3], [2e4, 5e-4], [4e5, 3e-4], [6e4, 1e-4], [8e4, 5e-5], ] }, scheduler_after_epoch=False, # scheduler doesn't work without this flag
Need to experiment with these below for capacitron
)
ap = AudioProcessor(**config.audio.to_dict())
tokenizer, config = TTSTokenizer.init_from_config(config)
train_samples, eval_samples = load_tts_samples(dataset_config, eval_split=True)
model = Tacotron2(config, ap, tokenizer, speaker_manager=None)
trainer = Trainer( TrainerArgs(), config, output_path, model=model, train_samples=train_samples, eval_samples=eval_samples, training_assets={"audio_processor": ap}, )
trainer.fit()
No response