erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
1.16k stars 123 forks source link

How to further train a successfully trained model? #282

Closed maxbizz closed 4 months ago

maxbizz commented 4 months ago

Yesterday I completed training a model for 10 epochs witha dataset of 200 clips. But im not satisfied with the result and want to continue training further but im unable to do so. Getting out of memory error while crashing the whole process. My question is how to further train a successfully trained model?

Running on local URL:  http://127.0.0.1:7052
[FINETUNE] HTTP Request: GET https://checkip.amazonaws.com/ "HTTP/1.1 200 "
[FINETUNE] HTTP Request: GET http://127.0.0.1:7052/startup-events "HTTP/1.1 200 OK"
[FINETUNE] HTTP Request: HEAD http://127.0.0.1:7052/ "HTTP/1.1 200 OK"

To create a public link, set `share=True` in `launch()`.
[FINETUNE] HTTP Request: GET https://api.gradio.app/pkg-version "HTTP/1.1 200 OK"
[FINETUNE] Starting Step 2 - Fine-tuning the XTTS Encoder
[FINETUNE] Language: en Epochs: 12 Batch size: 2 Grad accumulation steps: 1
[FINETUNE] Training   : I:\AI\alltalk_tts\finetune\coffeehousecrime1\metadata_train.csv
[FINETUNE] Evaluation : I:\AI\alltalk_tts\finetune\coffeehousecrime1\metadata_eval.csv
[FINETUNE] Model used : I:\AI\alltalk_tts\models\xtts\xttsv2_2.0.3
[FINETUNE] Available VRAM: 12.00 GB
[FINETUNE]
[FINETUNE] ****** WARNING PRE-FLIGHT CHECKS FAILED ******* WARNING PRE-FLIGHT CHECKS FAILED *****
[FINETUNE] Available VRAM: 12.00 GB
[FINETUNE] If you are running on a Linux system and you have 12GB's or less of VRAM, this step
[FINETUNE] may fail, due to not enough GPU VRAM. Windows systems will use system RAM as extended
[FINETUNE] VRAM and so should work ok. However, Windows machines will need enough System RAM
[FINETUNE] available. Please read the PFC help section available on the first tab of the web
[FINETUNE] interface for more information.
[FINETUNE] ****** WARNING PRE-FLIGHT CHECKS FAILED ******* WARNING PRE-FLIGHT CHECKS FAILED *****
[FINETUNE]
[FINETUNE] Continuing previous fine tuning I:\AI\alltalk_tts\finetune\coffeehousecrime1\training\XTTS_FT-July-26-2024_01+52AM-b03e084\best_model_770.pth
[FINETUNE] Learning Scheduler CosineAnnealingWarmRestarts, params {'T_0': 3, 'T_mult': 1, 'eta_min': 1e-06, 'last_epoch': -1}
[FINETUNE] DVAE weights restored from: I:\AI\alltalk_tts\models\xtts\xttsv2_2.0.3\dvae.pth
[FINETUNE] Found 219 files in I:\AI\alltalk_tts\finetune\coffeehousecrime1
continue_path training from previous run.
 > Training Environment:
 | > Backend: Torch
 | > Mixed precision: False
 | > Precision: float32
 | > Current device: 0
 | > Num. of GPUs: 1
 | > Num. of CPUs: 12
 | > Num. of Torch Threads: 1
 | > Torch seed: 54321
 | > Torch CUDNN: True
 | > Torch CUDNN deterministic: False
 | > Torch CUDNN benchmark: False
 | > Torch TF32 MatMul: False
 > Start Tensorboard: tensorboard --logdir=I:\AI\alltalk_tts\finetune\coffeehousecrime1\training\XTTS_FT-July-26-2024_01+52AM-b03e084\
 > Restoring from checkpoint_1000.pth ...
 > Restoring Model...
 > Restoring Optimizer...
 > Model restored from step 1000

 > Model has 518442047 parameters
 > Restoring best loss from best_model_770.pth ...
 > Starting with loaded last best loss {'train_loss': 2.6049492359161377, 'eval_loss': 3.037793901231554}

 > EPOCH: 0/1000
 --> I:\AI\alltalk_tts\finetune\coffeehousecrime1\training\XTTS_FT-July-26-2024_01+52AM-b03e084\
[FINETUNE] Sampling by language: dict_keys(['en'])

 > TRAINING (2024-07-26 08:59:27)
Traceback (most recent call last):
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\queueing.py", line 560, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1945, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1525, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\utils.py", line 655, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\utils.py", line 781, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\utils.py", line 747, in continuous_coro
    output = fn(*args)
             ^^^^^^^^^
  File "I:\AI\alltalk_tts\finetune.py", line 1270, in load_metrics
    return c_logger.plot_metrics(), f"Running Time: {c_logger.format_duration(c_logger.total_duration)} - Estimated Completion: {c_logger.format_duration(c_logger.estimated_duration)}"
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\metrics_logger.py", line 158, in plot_metrics
    plt.tight_layout()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\matplotlib\pyplot.py", line 2801, in tight_layout
    gcf().tight_layout(pad=pad, h_pad=h_pad, w_pad=w_pad, rect=rect)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\matplotlib\figure.py", line 3545, in tight_layout
    engine.execute(self)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\matplotlib\layout_engine.py", line 181, in execute
    renderer = fig._get_renderer()
               ^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\matplotlib\figure.py", line 2762, in _get_renderer
    return self.canvas.get_renderer()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\matplotlib\backends\backend_agg.py", line 397, in get_renderer
    self.renderer = RendererAgg(w, h, self.figure.dpi)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\matplotlib\backends\backend_agg.py", line 70, in __init__
    self._renderer = _RendererAgg(int(width), int(height), dpi)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
MemoryError: In RendererAgg: Out of memory
 ! Run is kept in I:\AI\alltalk_tts\finetune\coffeehousecrime1\training\XTTS_FT-July-26-2024_01+52AM-b03e084\
Traceback (most recent call last):
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1877, in fit
    self._fit()
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1828, in _fit
    self.train_epoch()
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1543, in train_epoch
    outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1399, in train_step
    outputs, loss_dict_new, step_time = self.optimize(
                                        ^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1265, in optimize
    outputs, loss_dict = self._compute_loss(
                         ^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1196, in _compute_loss
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1155, in _model_train_step
    return model.train_step(*input_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\trainer\gpt_trainer.py", line 310, in train_step
    loss_text, loss_mel, _ = self.forward(
                             ^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\trainer\gpt_trainer.py", line 217, in forward
    losses = self.xtts.gpt(
             ^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 510, in forward
    text_logits, mel_logits = self.get_logits(
                              ^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 278, in get_logits
    gpt_out = self.gpt(
              ^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 1119, in forward
    outputs = block(
              ^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 654, in forward
    feed_forward_hidden_states = self.mlp(hidden_states)
                                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 575, in forward
    hidden_states = self.act(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\activations.py", line 56, in forward
    return 0.5 * input * (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (input + 0.044715 * torch.pow(input, 3.0))))
           ~~~~^~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 194.00 MiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 19.72 GiB is allocated by PyTorch, and 208.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
ERROR:    Traceback (most recent call last):
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1877, in fit
    self._fit()
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1828, in _fit
    self.train_epoch()
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1543, in train_epoch
    outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1399, in train_step
    outputs, loss_dict_new, step_time = self.optimize(
                                        ^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1265, in optimize
    outputs, loss_dict = self._compute_loss(
                         ^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1196, in _compute_loss
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1155, in _model_train_step
    return model.train_step(*input_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\trainer\gpt_trainer.py", line 310, in train_step
    loss_text, loss_mel, _ = self.forward(
                             ^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\trainer\gpt_trainer.py", line 217, in forward
    losses = self.xtts.gpt(
             ^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 510, in forward
    text_logits, mel_logits = self.get_logits(
                              ^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 278, in get_logits
    gpt_out = self.gpt(
              ^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 1119, in forward
    outputs = block(
              ^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 654, in forward
    feed_forward_hidden_states = self.mlp(hidden_states)
                                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 575, in forward
    hidden_states = self.act(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\activations.py", line 56, in forward
    return 0.5 * input * (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (input + 0.044715 * torch.pow(input, 3.0))))
           ~~~~^~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 194.00 MiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 19.72 GiB is allocated by PyTorch, and 208.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\windows_events.py", line 321, in run_forever
    super().run_forever()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\base_events.py", line 608, in run_forever
    self._run_once()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\base_events.py", line 1936, in _run_once
    handle._run()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\queueing.py", line 521, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1945, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1513, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\utils.py", line 831, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\finetune.py", line 1934, in train_model
    config_path, original_xtts_checkpoint, vocab_file, exp_path, speaker_wav = train_gpt(language, num_epochs, batch_size, grad_acumm, train_csv, eval_csv, learning_rate, model_to_train, continue_run, disable_shared_memory, learning_rate_scheduler, optimizer, warm_up, max_audio_length=max_audio_length, progress=gr.Progress())
                                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\finetune.py", line 995, in train_gpt
    trainer.fit()
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1906, in fit
    sys.exit(1)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\routing.py", line 741, in lifespan
    await receive()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\uvicorn\lifespan\on.py", line 137, in receive
    return await self.receive_queue.get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\queues.py", line 158, in get
    await getter
asyncio.exceptions.CancelledError

ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1877, in fit
    self._fit()
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1828, in _fit
    self.train_epoch()
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1543, in train_epoch
    outputs, _ = self.train_step(batch, batch_num_steps, cur_step, loader_start_time)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1399, in train_step
    outputs, loss_dict_new, step_time = self.optimize(
                                        ^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1265, in optimize
    outputs, loss_dict = self._compute_loss(
                         ^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1196, in _compute_loss
    outputs, loss_dict = self._model_train_step(batch, model, criterion)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1155, in _model_train_step
    return model.train_step(*input_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\trainer\gpt_trainer.py", line 310, in train_step
    loss_text, loss_mel, _ = self.forward(
                             ^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\trainer\gpt_trainer.py", line 217, in forward
    losses = self.xtts.gpt(
             ^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 510, in forward
    text_logits, mel_logits = self.get_logits(
                              ^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\TTS\tts\layers\xtts\gpt.py", line 278, in get_logits
    gpt_out = self.gpt(
              ^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 1119, in forward
    outputs = block(
              ^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 654, in forward
    feed_forward_hidden_states = self.mlp(hidden_states)
                                 ^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\models\gpt2\modeling_gpt2.py", line 575, in forward
    hidden_states = self.act(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\transformers\activations.py", line 56, in forward
    return 0.5 * input * (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (input + 0.044715 * torch.pow(input, 3.0))))
           ~~~~^~~~~~~
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 194.00 MiB. GPU 0 has a total capacity of 12.00 GiB of which 0 bytes is free. Of the allocated memory 19.72 GiB is allocated by PyTorch, and 208.33 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\windows_events.py", line 321, in run_forever
    super().run_forever()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\base_events.py", line 608, in run_forever
    self._run_once()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\base_events.py", line 1936, in _run_once
    handle._run()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\queueing.py", line 521, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1945, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\blocks.py", line 1513, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\anyio\_backends\_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\utils.py", line 831, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\finetune.py", line 1934, in train_model
    config_path, original_xtts_checkpoint, vocab_file, exp_path, speaker_wav = train_gpt(language, num_epochs, batch_size, grad_acumm, train_csv, eval_csv, learning_rate, model_to_train, continue_run, disable_shared_memory, learning_rate_scheduler, optimizer, warm_up, max_audio_length=max_audio_length, progress=gr.Progress())
                                                                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\finetune.py", line 995, in train_gpt
    trainer.fit()
  File "I:\AI\alltalk_tts\trainer_alltalk\trainer.py", line 1906, in fit
    sys.exit(1)
SystemExit: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 399, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\uvicorn\middleware\proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\fastapi\applications.py", line 1054, in __call__
    await super().__call__(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\applications.py", line 123, in __call__
    await self.middleware_stack(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\middleware\errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\gradio\route_utils.py", line 714, in __call__
    await self.app(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\middleware\exceptions.py", line 65, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\routing.py", line 756, in __call__
    await self.middleware_stack(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\routing.py", line 776, in app
    await route.handle(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\routing.py", line 297, in handle
    await self.app(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\routing.py", line 77, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\routing.py", line 75, in app
    await response(scope, receive, send)
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\responses.py", line 261, in wrap
    await func()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\starlette\responses.py", line 238, in listen_for_disconnect
    message = await receive()
              ^^^^^^^^^^^^^^^
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\site-packages\uvicorn\protocols\http\httptools_impl.py", line 553, in receive
    await self.message_event.wait()
  File "I:\AI\alltalk_tts\alltalk_environment\env\Lib\asyncio\locks.py", line 213, in wait
    await fut
asyncio.exceptions.CancelledError
erew123 commented 4 months ago

Hi @maxbizz

I see you are on Windows with a 12GB card. Please ensure that:

Worst case, if you still suffer the issue, you can go to the "What to do next" tab, Refresh the dropdowns and you should be able to "Compact and move model" which will copy the LAST completed EPOCH of the model to the folder of your choosing. From that point on you can just train the model as if it were any model, no need to select the "continue previous project".

Thanks

maxbizz commented 4 months ago

I can confirm that

I tried "compact an move model" which created a model folder inside models/xtts folder. But when i try to train this model i dont get an option to choose it inside "step-2 Training" page. There is one model to choose always which is xttsv2_2.0.3.

erew123 commented 4 months ago

Hi @maxbizz

Strange one, if you have it set that way, any memory needed beyond the VRAM should be spilling over into System RAM and you shouldn't get an out of memory issue/error.

Likewise, if you have performed the compact and move model and the folder exists, then you should be able to continue the training, though in fairness, you may need to close and restart finetuning for it to identify the folder as a trainable model.

Would you be able to confirm to me that you have these files in your new folder:

image

The actual file sizes should be the same as well.

Also could you close/reopen the finetuning software and confirm the model is still not showing as selectable?

If none of those steps are working or show an issue, could you please run the diagnostics (either atsetup.bat or start_diagnostics.bat) and drop the log file here so that I can try get an understanding of your system setup and see if I can spot anything there.

Thanks