Unexpected issue: float division by zero

Fyphen1223 commented 7 months ago

When I start to train fisrtly, it says it divided float by 0. What is going on?

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard
Reusing TensorBoard on port 6006 (pid 7752), started 0:15:04 ago. (Use '!kill 7752' to kill it.)
2023-12-05 00:47:05.105294: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-05 00:47:05.105367: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-05 00:47:05.105405: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-05 00:47:05.117685: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-05 00:47:07.518213: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Batch size per GPU : 8
/usr/local/lib/python3.10/dist-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
----------0----------
2023-12-05 00:47:17,370 - INFO - Start from 32k pretrain model: ./vits_pretrain/sovits5.0.pretrain.pth
2023-12-05 00:47:18,418 - INFO - Starting new training run.
----------0----------
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
Validation loop: 0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/content/so-vits-svc-5.0/svc_trainer.py", line 41, in <module>
    train(0, args, args.checkpoint_path, hp, hp_str)
  File "/content/so-vits-svc-5.0/vits_extend/train.py", line 160, in train
    validate(hp, args, model_g, model_d, valloader, stft, writer, step, device)
  File "/content/so-vits-svc-5.0/vits_extend/validation.py", line 44, in validate
    mel_loss = mel_loss / len(valloader.dataset)
ZeroDivisionError: float division by zero

Gabibing commented 7 months ago

It seems that there is no validation dataset. if you don't want to add validation dataset, just disable validation phase.

in line 160 in content/so-vits-svc-5.0/vits_extend/train.py
# validate(hp, args, model_g, model_d, valloader, stft, writer, step, device)

Fyphen1223 commented 7 months ago

Thank you for your answer, but how can I make or add validation dataset?

Fyphen1223 commented 7 months ago

And when I disabled that line, the training code just says 0it/s, and does not do anything. It just create sovits5.0-xxx.ckpt.

Fyphen1223 commented 7 months ago

Is this behaviour ok?

Gabibing commented 7 months ago

You should check if you prepared well. I think you don't have valid "files/valid.txt" in this case. You can create that file it by running prepare/preprocess_train.py.

Fyphen1223 commented 7 months ago

um, I did running it in official Colab. And it just says 0/0 0it/s. What is going on?

Gabibing commented 7 months ago

Exactly, validation set is for the validation not for training. So if you can train your model well, you can skip the validation phase. (but its result isn't guaranteed) I can't answer what exact problem is because there could be a lot of reasons for that. it may be due to preprocess, dataset path, or other problems. I hope you find the cause well.

Fyphen1223 commented 7 months ago

um, I think I figure out the way. I didn't zipped dataset in correct way. Thanks @Gabibing for supporting me. For someone looking for answer for this problem:

You must create folders called speaker0 - speakerxxx in dataset_raw folder. In speakerxxx, you should put dataset file (audio file). I think most of popular format of audio files are accepted in this repository.

PlayVoice / whisper-vits-svc

Unexpected issue: float division by zero #150