FENRlR / MB-iSTFT-VITS2

Application of MB-iSTFT-VITS components to vits2_pytorch
MIT License
107 stars 27 forks source link

“ZeroDivisionError: integer division or modulo by zero” in Google Colab #8

Closed Bohemian-self closed 11 months ago

Bohemian-self commented 11 months ago

The Chinese-Japanese bilingual MB-iSTFT-VITS was trained using a custom dataset, which was formatted in Python3.10, Torch1.12.1, and Torchvision0.13.1 environments, but the following error occurred: /content/MB-iSTFT-VITS2/dataset_raw/auwa2-MB-iSTFT-VITS2 INFO:auwa2-MiVITS-W1:{'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 1234, 'epochs': 20000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 32, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 8192, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0, 'fft_sizes': [384, 683, 171], 'hop_sizes': [30, 60, 10], 'win_lengths': [150, 300, 60], 'window': 'hann_window'}, 'data': {'use_mel_posterior_encoder': True, 'training_files': '/content/MB-iSTFT-VITS2/dataset_raw/auwa2-MB-iSTFT-VITS2/character_scripts.txt.cleaned', 'validation_files': '/content/MB-iSTFT-VITS2/dataset_raw/auwa2-MB-iSTFT-VITS2/character_scripts_val.txt.cleaned', 'text_cleaners': ['zh_ja_mixture_cleaners'], 'max_wav_value': 32768.0, 'sampling_rate': 22050, 'filter_length': 1024, 'hop_length': 256, 'win_length': 1024, 'n_mel_channels': 80, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 0, 'cleaned_text': True}, 'model': {'use_mel_posterior_encoder': True, 'use_transformer_flows': True, 'transformer_flow_type': 'pre_conv', 'use_spk_conditioned_encoder': False, 'use_noise_scaled_mas': True, 'use_duration_discriminator': True, 'ms_istft_vits': False, 'mb_istft_vits': True, 'istft_vits': False, 'subbands': 4, 'gen_istft_n_fft': 16, 'gen_istft_hop_size': 4, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [4, 4], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16], 'n_layers_q': 3, 'use_spectral_norm': False, 'use_sdp': False}, 'model_dir': './logs/models/auwa2-MiVITS-W1'} 2023-09-28 13:38:10.745201: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. DEBUG:tensorflow:Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client. 2023-09-28 13:38:12.103548: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT DEBUG:h5py._conv:Creating converter from 7 to 5 DEBUG:h5py._conv:Creating converter from 5 to 7 DEBUG:h5py._conv:Creating converter from 7 to 5 DEBUG:h5py._conv:Creating converter from 5 to 7 DEBUG:jaxlib.mlir._mlir_libs:Initializing MLIR with module: _site_initialize_0 DEBUG:jaxlib.mlir._mlir_libs:Registering dialects from initializer <module 'jaxlib.mlir._mlir_libs._site_initialize_0' from '/usr/local/lib/python3.10/dist-packages/jaxlib/mlir/_mlir_libs/_site_initialize_0.so'> DEBUG:jax._src.xla_bridge:No jax_plugins namespace packages available DEBUG:jax._src.path:etils.epath found. Using etils.epath for file I/O. INFO:numexpr.utils:NumExpr defaulting to 2 threads. INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. Using mel posterior encoder for VITS2 /root/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py:563: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( Using transformer flows pre_conv for VITS2 Using normal encoder for VITS1 (cuz it's single speaker after all) Using noise scaled MAS for VITS2 Using duration discriminator for VITS2 Mutli-band iSTFT VITS2 Loading training data: 0% 0/7 [00:00<?, ?it/s] Exception ignored in: <function _MultiProcessingDataLoaderIter.del at 0x7addb89dc3a0> Traceback (most recent call last): File "/root/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1510, in del self._shutdown_workers() File "/root/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1441, in _shutdown_workers if not self._shutdown: AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_shutdown' Traceback (most recent call last): File "/content/MB-iSTFT-VITS2/train.py", line 461, in main() File "/content/MB-iSTFT-VITS2/train.py", line 55, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/root/.local/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/root/.local/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 198, in start_processes while not context.join(): File "/root/.local/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/root/.local/lib/python3.10/site-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, args) File "/content/MB-iSTFT-VITS2/train.py", line 207, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], File "/content/MB-iSTFT-VITS2/train.py", line 240, in train_and_evaluate for batch_idx, (x, x_lengths, spec, spec_lengths, y, y_lengths) in enumerate(loader): File "/usr/local/lib/python3.10/dist-packages/tqdm/std.py", line 1182, in iter for obj in iterable: File "/root/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 444, in iter return self._get_iterator() File "/root/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 390, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "/root/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1038, in init super(_MultiProcessingDataLoaderIter, self).init(loader) File "/root/.local/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 651, in init self._sampler_iter = iter(self._index_sampler) File "/content/MB-iSTFT-VITS2/data_utils.py", line 400, in iter ids_bucket = ids_bucket + ids_bucket (rem // len_bucket) + ids_bucket[:(rem % len_bucket)] ZeroDivisionError: integer division or modulo by zero I hope you can give me the idea of troubleshooting the problem, thank you very much!

FENRlR commented 11 months ago

That would happen when len_bucket is zero. At first, I suspected a version mismatch from pytorch but failed to replicate this issue on colab.

Assuming you have correct symbols for the cleaner, one robust solution to bypass this issue would be just skipping the line 400 when len_bucket is zero.

if len_bucket>0:
    ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]

Or, if you are using librosa >= 0.10.1, try editing the following : mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax) to mel = librosa_mel_fn(sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax) in mel_processing.py (line 83, 101).

Bohemian-self commented 11 months ago

Thanks,it works!