JarodMica / ai-voice-cloning

GNU General Public License v3.0
655 stars 144 forks source link

Hangs indefinitely when training begins, unable to find any additional logs #101

Open ttjensen opened 6 months ago

ttjensen commented 6 months ago

I've been able to successful prepare my dataset and configuration, but no matter the configuration settings I change, I always get stuck right near the beginning of training. There is no activity in the UI once I get to this point, and no further activity in the console. I also don't believe I'm getting the full error message here, it simply ends with [Training] [2024-05-08T15:41:17.382663] warnings.warn(

For anyone who encountered a similar issue, were you able to get past it?

image

Spawning process:  train.bat ./training/threeDog/train.yaml
[Training] [2024-05-08T15:40:51.832648]
[Training] [2024-05-08T15:40:51.833616] (venv) D:\Projects\threeDog\ai-voice-cloning>call .\venv\Scripts\activate.bat
[Training] [2024-05-08T15:40:53.450392] W0508 15:40:53.450000 4672 torch\distributed\elastic\multiprocessing\redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2024-05-08T15:40:55.680186] 24-05-08 15:40:55.680 - INFO:   name: threeDog
[Training] [2024-05-08T15:40:55.681183]   model: extensibletrainer
[Training] [2024-05-08T15:40:55.682181]   scale: 1
[Training] [2024-05-08T15:40:55.682181]   gpu_ids: [0]
[Training] [2024-05-08T15:40:55.683178]   start_step: 0
[Training] [2024-05-08T15:40:55.684175]   checkpointing_enabled: True
[Training] [2024-05-08T15:40:55.684175]   fp16: False
[Training] [2024-05-08T15:40:55.685172]   bitsandbytes: False
[Training] [2024-05-08T15:40:55.686171]   gpus: 1
[Training] [2024-05-08T15:40:55.686171]   datasets:[
[Training] [2024-05-08T15:40:55.687168]     train:[
[Training] [2024-05-08T15:40:55.688165]       name: training
[Training] [2024-05-08T15:40:55.688165]       n_workers: 1
[Training] [2024-05-08T15:40:55.689162]       batch_size: 128
[Training] [2024-05-08T15:40:55.690159]       mode: paired_voice_audio
[Training] [2024-05-08T15:40:55.690159]       path: ./training/threeDog/train.txt
[Training] [2024-05-08T15:40:55.691156]       fetcher_mode: ['lj']
[Training] [2024-05-08T15:40:55.691156]       phase: train
[Training] [2024-05-08T15:40:55.692154]       max_wav_length: 255995
[Training] [2024-05-08T15:40:55.693151]       max_text_length: 200
[Training] [2024-05-08T15:40:55.694167]       sample_rate: 22050
[Training] [2024-05-08T15:40:55.695169]       load_conditioning: True
[Training] [2024-05-08T15:40:55.695169]       num_conditioning_candidates: 2
[Training] [2024-05-08T15:40:55.696146]       conditioning_length: 44000
[Training] [2024-05-08T15:40:55.696146]       use_bpe_tokenizer: True
[Training] [2024-05-08T15:40:55.697141]       tokenizer_vocab: ./models/tokenizers/en_tokenizer.json
[Training] [2024-05-08T15:40:55.698138]       load_aligned_codes: False
[Training] [2024-05-08T15:40:55.698138]       data_type: img
[Training] [2024-05-08T15:40:55.699135]     ]
[Training] [2024-05-08T15:40:55.700152]     val:[
[Training] [2024-05-08T15:40:55.700152]       name: validation
[Training] [2024-05-08T15:40:55.701130]       n_workers: 1
[Training] [2024-05-08T15:40:55.701130]       batch_size: 4
[Training] [2024-05-08T15:40:55.702127]       mode: paired_voice_audio
[Training] [2024-05-08T15:40:55.703125]       path: ./training/threeDog/validation.txt
[Training] [2024-05-08T15:40:55.703125]       fetcher_mode: ['lj']
[Training] [2024-05-08T15:40:55.704122]       phase: val
[Training] [2024-05-08T15:40:55.704122]       max_wav_length: 255995
[Training] [2024-05-08T15:40:55.705119]       max_text_length: 200
[Training] [2024-05-08T15:40:55.706116]       sample_rate: 22050
[Training] [2024-05-08T15:40:55.706116]       load_conditioning: True
[Training] [2024-05-08T15:40:55.707114]       num_conditioning_candidates: 2
[Training] [2024-05-08T15:40:55.707114]       conditioning_length: 44000
[Training] [2024-05-08T15:40:55.708111]       use_bpe_tokenizer: True
[Training] [2024-05-08T15:40:55.709109]       tokenizer_vocab: ./models/tokenizers/en_tokenizer.json
[Training] [2024-05-08T15:40:55.710106]       load_aligned_codes: False
[Training] [2024-05-08T15:40:55.710106]       data_type: img
[Training] [2024-05-08T15:40:55.711103]     ]
[Training] [2024-05-08T15:40:55.711103]   ]
[Training] [2024-05-08T15:40:55.712100]   steps:[
[Training] [2024-05-08T15:40:55.712100]     gpt_train:[
[Training] [2024-05-08T15:40:55.713097]       training: gpt
[Training] [2024-05-08T15:40:55.713097]       loss_log_buffer: 500
[Training] [2024-05-08T15:40:55.714095]       optimizer: adamw
[Training] [2024-05-08T15:40:55.714095]       optimizer_params:[
[Training] [2024-05-08T15:40:55.715094]         lr: 0.0001
[Training] [2024-05-08T15:40:55.715094]         weight_decay: 0.01
[Training] [2024-05-08T15:40:55.716090]         beta1: 0.9
[Training] [2024-05-08T15:40:55.716090]         beta2: 0.96
[Training] [2024-05-08T15:40:55.717087]       ]
[Training] [2024-05-08T15:40:55.717087]       clip_grad_eps: 4
[Training] [2024-05-08T15:40:55.718085]       injectors:[
[Training] [2024-05-08T15:40:55.719082]         paired_to_mel:[
[Training] [2024-05-08T15:40:55.719082]           type: torch_mel_spectrogram
[Training] [2024-05-08T15:40:55.720079]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2024-05-08T15:40:55.720079]           in: wav
[Training] [2024-05-08T15:40:55.721077]           out: paired_mel
[Training] [2024-05-08T15:40:55.722073]         ]
[Training] [2024-05-08T15:40:55.722073]         paired_cond_to_mel:[
[Training] [2024-05-08T15:40:55.723072]           type: for_each
[Training] [2024-05-08T15:40:55.723072]           subtype: torch_mel_spectrogram
[Training] [2024-05-08T15:40:55.724069]           mel_norm_file: ./modules/tortoise-tts/tortoise/data/mel_norms.pth
[Training] [2024-05-08T15:40:55.725066]           in: conditioning
[Training] [2024-05-08T15:40:55.726063]           out: paired_conditioning_mel
[Training] [2024-05-08T15:40:55.726063]         ]
[Training] [2024-05-08T15:40:55.727061]         to_codes:[
[Training] [2024-05-08T15:40:55.728057]           type: discrete_token
[Training] [2024-05-08T15:40:55.730053]           in: paired_mel
[Training] [2024-05-08T15:40:55.731051]           out: paired_mel_codes
[Training] [2024-05-08T15:40:55.732048]           dvae_config: ./models/tortoise/train_diffusion_vocoder_22k_level.yml
[Training] [2024-05-08T15:40:55.733044]         ]
[Training] [2024-05-08T15:40:55.734733]         paired_fwd_text:[
[Training] [2024-05-08T15:40:55.736724]           type: generator
[Training] [2024-05-08T15:40:55.737722]           generator: gpt
[Training] [2024-05-08T15:40:55.739717]           in: ['paired_conditioning_mel', 'padded_text', 'text_lengths', 'paired_mel_codes', 'wav_lengths']
[Training] [2024-05-08T15:40:55.741711]           out: ['loss_text_ce', 'loss_mel_ce', 'logits']
[Training] [2024-05-08T15:40:55.743706]         ]
[Training] [2024-05-08T15:40:55.746698]       ]
[Training] [2024-05-08T15:40:55.749691]       losses:[
[Training] [2024-05-08T15:40:55.751684]         text_ce:[
[Training] [2024-05-08T15:40:55.753679]           type: direct
[Training] [2024-05-08T15:40:55.755674]           weight: 1
[Training] [2024-05-08T15:40:55.757669]           key: loss_text_ce
[Training] [2024-05-08T15:40:55.759663]         ]
[Training] [2024-05-08T15:40:55.760661]         mel_ce:[
[Training] [2024-05-08T15:40:55.762655]           type: direct
[Training] [2024-05-08T15:40:55.763653]           weight: 1
[Training] [2024-05-08T15:40:55.765647]           key: loss_mel_ce
[Training] [2024-05-08T15:40:55.766644]         ]
[Training] [2024-05-08T15:40:55.768639]       ]
[Training] [2024-05-08T15:40:55.770634]     ]
[Training] [2024-05-08T15:40:55.773626]   ]
[Training] [2024-05-08T15:40:55.775620]   networks:[
[Training] [2024-05-08T15:40:55.777616]     gpt:[
[Training] [2024-05-08T15:40:55.778612]       type: generator
[Training] [2024-05-08T15:40:55.780630]       which_model_G: unified_voice2
[Training] [2024-05-08T15:40:55.782603]       kwargs:[
[Training] [2024-05-08T15:40:55.783600]         layers: 30
[Training] [2024-05-08T15:40:55.784597]         model_dim: 1024
[Training] [2024-05-08T15:40:55.785594]         heads: 16
[Training] [2024-05-08T15:40:55.785594]         max_text_tokens: 402
[Training] [2024-05-08T15:40:55.786590]         max_mel_tokens: 604
[Training] [2024-05-08T15:40:55.787588]         max_conditioning_inputs: 2
[Training] [2024-05-08T15:40:55.788586]         mel_length_compression: 1024
[Training] [2024-05-08T15:40:55.789583]         number_text_tokens: 256
[Training] [2024-05-08T15:40:55.789583]         number_mel_codes: 8194
[Training] [2024-05-08T15:40:55.790580]         start_mel_token: 8192
[Training] [2024-05-08T15:40:55.791578]         stop_mel_token: 8193
[Training] [2024-05-08T15:40:55.792576]         start_text_token: 255
[Training] [2024-05-08T15:40:55.793575]         train_solo_embeddings: False
[Training] [2024-05-08T15:40:55.794570]         use_mel_codes_as_input: True
[Training] [2024-05-08T15:40:55.795567]         checkpointing: True
[Training] [2024-05-08T15:40:55.796564]         tortoise_compat: True
[Training] [2024-05-08T15:40:55.797562]       ]
[Training] [2024-05-08T15:40:55.797562]     ]
[Training] [2024-05-08T15:40:55.798559]   ]
[Training] [2024-05-08T15:40:55.798559]   path:[
[Training] [2024-05-08T15:40:55.799556]     strict_load: True
[Training] [2024-05-08T15:40:55.799556]     pretrain_model_gpt: ./models/tortoise/autoregressive.pth
[Training] [2024-05-08T15:40:55.800554]     root: ./
[Training] [2024-05-08T15:40:55.801552]     experiments_root: ./training\threeDog\finetune
[Training] [2024-05-08T15:40:55.802549]     models: ./training\threeDog\finetune\models
[Training] [2024-05-08T15:40:55.803547]     training_state: ./training\threeDog\finetune\training_state
[Training] [2024-05-08T15:40:55.804554]     log: ./training\threeDog\finetune
[Training] [2024-05-08T15:40:55.805541]     val_images: ./training\threeDog\finetune\val_images
[Training] [2024-05-08T15:40:55.805541]   ]
[Training] [2024-05-08T15:40:55.806537]   train:[
[Training] [2024-05-08T15:40:55.806537]     niter: 80
[Training] [2024-05-08T15:40:55.807535]     warmup_iter: -1
[Training] [2024-05-08T15:40:55.807535]     mega_batch_factor: 32
[Training] [2024-05-08T15:40:55.808533]     val_freq: 40
[Training] [2024-05-08T15:40:55.808533]     ema_enabled: False
[Training] [2024-05-08T15:40:55.809531]     default_lr_scheme: MultiStepLR
[Training] [2024-05-08T15:40:55.810527]     gen_lr_steps: [16, 32, 72, 144, 200, 264, 400]
[Training] [2024-05-08T15:40:55.810527]     lr_gamma: 0.5
[Training] [2024-05-08T15:40:55.811524]   ]
[Training] [2024-05-08T15:40:55.811524]   eval:[
[Training] [2024-05-08T15:40:55.812522]     pure: False
[Training] [2024-05-08T15:40:55.812522]     output_state: gen
[Training] [2024-05-08T15:40:55.813520]   ]
[Training] [2024-05-08T15:40:55.814517]   logger:[
[Training] [2024-05-08T15:40:55.815515]     save_checkpoint_freq: 40
[Training] [2024-05-08T15:40:55.815515]     visuals: ['gen', 'mel']
[Training] [2024-05-08T15:40:55.816512]     visual_debug_rate: 40
[Training] [2024-05-08T15:40:55.817509]     is_mel_spectrogram: True
[Training] [2024-05-08T15:40:55.817509]   ]
[Training] [2024-05-08T15:40:55.820500]   is_train: True
[Training] [2024-05-08T15:40:55.820500]   dist: False
[Training] [2024-05-08T15:40:55.821497]
[Training] [2024-05-08T15:40:55.823493] 24-05-08 15:40:55.680 - INFO: Random seed: 9150
[Training] [2024-05-08T15:40:58.304546] 24-05-08 15:40:58.304 - INFO: Number of training data elements: 914, iters: 8
[Training] [2024-05-08T15:40:58.305542] 24-05-08 15:40:58.304 - INFO: Total epochs needed: 10 for iters 80
[Training] [2024-05-08T15:40:59.278111] D:\Projects\threeDog\ai-voice-cloning\venv\Lib\site-packages\transformers\configuration_utils.py:380: UserWarning: Passing `gradient_checkpointing` to a config initialization is deprecated and will be removed in v5 Transformers. Using `model.gradient_checkpointing_enable()` instead, or if you are using the `Trainer` API, pass `gradient_checkpointing=True` in your `TrainingArguments`.
[Training] [2024-05-08T15:40:59.279110]   warnings.warn(
[Training] [2024-05-08T15:41:05.793832] 24-05-08 15:41:05.793 - INFO: Loading model for [./models/tortoise/autoregressive.pth]
[Training] [2024-05-08T15:41:06.457873] 24-05-08 15:41:06.451 - INFO: Start training from epoch: 0, iter: 0
[Training] [2024-05-08T15:41:08.011208] W0508 15:41:08.011000 6404 torch\distributed\elastic\multiprocessing\redirects.py:27] NOTE: Redirects are currently not supported in Windows or MacOs.
[Training] [2024-05-08T15:41:16.361412] D:\Projects\threeDog\ai-voice-cloning\venv\Lib\site-packages\torch\optim\lr_scheduler.py:143: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
[Training] [2024-05-08T15:41:16.361412]   warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
[Training] [2024-05-08T15:41:17.381665] D:\Projects\threeDog\ai-voice-cloning\venv\Lib\site-packages\torch\utils\checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
[Training] [2024-05-08T15:41:17.382663]   warnings.warn(
ttjensen commented 6 months ago

It looks like I was maybe able to get a little bit more information after I do a keyboardInterrupt?

Keyboard interruption in main thread... closing server.
[Training] [2024-05-08T15:48:46.635992] Disabled distributed training.
[Training] [2024-05-08T15:48:46.635992] Path already exists. Rename it to [./training\threeDog\finetune_archived_240508-154055]
[Training] [2024-05-08T15:48:46.636971] Loading from ./models/tortoise/dvae.pth
[Training] [2024-05-08T15:48:46.636971] Traceback (most recent call last):
[Training] [2024-05-08T15:48:46.636971]   File "D:\Projects\threeDog\ai-voice-cloning\src\train.py", line 72, in <module>
[Training] [2024-05-08T15:48:46.636971]     train(config_path, args.launcher)
[Training] [2024-05-08T15:48:46.636971]   File "D:\Projects\threeDog\ai-voice-cloning\src\train.py", line 39, in train
[Training] [2024-05-08T15:48:46.637968]     trainer.do_training()
[Training] [2024-05-08T15:48:46.637968]   File "D:\Projects\threeDog\ai-voice-cloning\modules\dlas\dlas\train.py", line 408, in do_training
[Training] [2024-05-08T15:48:46.638975]     metric = self.do_step(train_data)
[Training] [2024-05-08T15:48:46.638975]              ^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2024-05-08T15:48:46.638975]   File "D:\Projects\threeDog\ai-voice-cloning\modules\dlas\dlas\train.py", line 271, in do_step
[Training] [2024-05-08T15:48:46.639963]     gradient_norms_dict = self.model.optimize_parameters(
[Training] [2024-05-08T15:48:46.639963]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2024-05-08T15:48:46.639963]   File "D:\Projects\threeDog\ai-voice-cloning\modules\dlas\dlas\trainer\ExtensibleTrainer.py", line 321, in optimize_parameters
[Training] [2024-05-08T15:48:46.639963]     ns = step.do_forward_backward(
[Training] [2024-05-08T15:48:46.640960]          ^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2024-05-08T15:48:46.640960]   File "D:\Projects\threeDog\ai-voice-cloning\modules\dlas\dlas\trainer\steps.py", line 322, in do_forward_backward
[Training] [2024-05-08T15:48:46.640960]     self.scaler.scale(total_loss).backward()
[Training] [2024-05-08T15:48:46.640960]   File "D:\Projects\threeDog\ai-voice-cloning\venv\Lib\site-packages\torch\_tensor.py", line 525, in backward
[Training] [2024-05-08T15:48:46.641962]     torch.autograd.backward(
[Training] [2024-05-08T15:48:46.641962]   File "D:\Projects\threeDog\ai-voice-cloning\venv\Lib\site-packages\torch\autograd\__init__.py", line 267, in backward
[Training] [2024-05-08T15:48:46.641962]     _engine_run_backward(
[Training] [2024-05-08T15:48:46.642968]   File "D:\Projects\threeDog\ai-voice-cloning\venv\Lib\site-packages\torch\autograd\graph.py", line 744, in _engine_run_backward
[Training] [2024-05-08T15:48:46.642968]     return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
[Training] [2024-05-08T15:48:46.642968]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Training] [2024-05-08T15:48:46.642968] KeyboardInterrupt
bumajzl01 commented 6 months ago

For me there is a significant pause between this warnings.warn() and the training showing in the ui. It eventually does appear but I sometimes have to wait for up to 5 minutes...