erew123 / alltalk_tts

AllTalk is based on the Coqui TTS engine, similar to the Coqui_tts extension for Text generation webUI, however supports a variety of advanced features, such as a settings page, low VRAM support, DeepSpeed, narrator, model finetuning, custom models, wav file maintenance. It can also be used with 3rd Party software via JSON calls.
GNU Affero General Public License v3.0
864 stars 98 forks source link

len(DataLoader) returns 0. Make sure your dataset is not empty or len(dataset) > 0. #271

Closed SuperMaximus1984 closed 2 months ago

SuperMaximus1984 commented 2 months ago

Hi! Finetuning results in the below error. The dataset (1st stage) has been created without a problem. Could you please help to solve it?

>> DVAE weights restored from: D:\PythonProjects\alltalk_tts\models\xttsv2_2.0.2\dvae.pth
 | > Found 6 files in D:\PythonProjects\alltalk_tts\finetune\tmp-trn
 > Training Environment:
 | > Backend: Torch
 | > Mixed precision: False
 | > Precision: float32
 | > Current device: 0
 | > Num. of GPUs: 1
 | > Num. of CPUs: 24
 | > Num. of Torch Threads: 1
 | > Torch seed: 1
 | > Torch CUDNN: True
 | > Torch CUDNN deterministic: False
 | > Torch CUDNN benchmark: False
 | > Torch TF32 MatMul: False
 > Start Tensorboard: tensorboard --logdir=D:\PythonProjects\alltalk_tts\finetune\tmp-trn\training\XTTS_FT-July-12-2024_05+06PM-1a8a20f

 > Model has 517360175 parameters

 > EPOCH: 0/20
 --> D:\PythonProjects\alltalk_tts\finetune\tmp-trn\training\XTTS_FT-July-12-2024_05+06PM-1a8a20f
 > Sampling by language: dict_keys(['en'])

 > TRAINING (2024-07-12 17:06:25)
[!] Warning: The text length exceeds the character limit of 250 for language 'en', this might cause truncated audio.
[!] Warning: The text length exceeds the character limit of 250 for language 'en', this might cause truncated audio.
[!] Warning: The text length exceeds the character limit of 250 for language 'en', this might cause truncated audio.
[!] Warning: The text length exceeds the character limit of 250 for language 'en', this might cause truncated audio.
[!] Warning: The text length exceeds the character limit of 250 for language 'en', this might cause truncated audio.

   --> TIME: 2024-07-12 17:07:08 -- STEP: 0/2 -- GLOBAL_STEP: 0
     | > loss_text_ce: 0.028337828814983368  (0.028337828814983368)
     | > loss_mel_ce: 3.8821115493774414  (3.8821115493774414)
     | > loss: 3.910449266433716  (3.910449266433716)
     | > grad_norm: 0  (0)
     | > current_lr: 5e-06
     | > step_time: 0.8454  (0.8453788757324219)
     | > loader_time: 42.5951  (42.595067262649536)

 > Filtering invalid eval samples!!
 > Total eval samples after filtering: 0
Traceback (most recent call last):
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 1833, in fit
    self._fit()
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 1787, in _fit
    self.eval_epoch()
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 1628, in eval_epoch
    self.get_eval_dataloader(
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 990, in get_eval_dataloader
    return self._get_loader(
           ^^^^^^^^^^^^^^^^^
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 914, in _get_loader
    len(loader) > 0
AssertionError:  ❗ len(DataLoader) returns 0. Make sure your dataset is not empty or len(dataset) > 0.
Traceback (most recent call last):
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 1833, in fit
    self._fit()
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 1787, in _fit
    self.eval_epoch()
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 1628, in eval_epoch
    self.get_eval_dataloader(
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 990, in get_eval_dataloader
    return self._get_loader(
           ^^^^^^^^^^^^^^^^^
  File "D:\PythonProjects\alltalk_tts\alltalk_environment\env\Lib\site-packages\trainer\trainer.py", line 914, in _get_loader
    len(loader) > 0
AssertionError:  ❗ len(DataLoader) returns 0. Make sure your dataset is not empty or len(dataset) > 0.
erew123 commented 2 months ago

Hi @SuperMaximus1984

The issue you have is related to the samples you are using somehow:

 > Filtering invalid eval samples!!
 > Total eval samples after filtering: 0

Obviously, I don't know what your samples are like, how good the quality is etc. But My best guess is that its not been able to automatically break down your ORIGINAL supplied samples that you used in step one:

[!] Warning: The text length exceeds the character limit of 250 for language 'en', this might cause truncated audio.

What I would suggest is that you delete the current training data (which you can do on the final step, there is a button there). and manually break down your original sample(s) a bit in audacity or a similar audio editing package, then place those smaller samples in the "put-voice-samples-in-here" here directory. For example, lets say your original training sample is 10 minutes long, you could break it down into 5-10 smaller samples, which will help it further break down the original sample into smaller ones when it transcribes. After that, re-run the step 1.

I have introduced code in AllTalk V2's finetuning which works around this issue, so you can also look at using the v2 BETA.

Thanks

SuperMaximus1984 commented 2 months ago

@erew123 Thanks, I'll try V2! Btw, I deleted the tmp data, changed the model to Whisper Large-V3 and was able to run the training smoothly.