YYuX-1145 / Bert-VITS2-Integration-package

vits2 backbone with bert
https://www.bilibili.com/video/BV13p4y1d7v9
GNU Affero General Public License v3.0
332 stars 30 forks source link

训练时跑空,读取DUR_0.pth时报错error, norm_1.gamma is not in the checkpoint #19

Open jnc-nj opened 1 year ago

jnc-nj commented 1 year ago
root@36025d9f6349:/workspace/vits2# python train_ms.py -c ./configs/config.json 
INFO:OUTPUT_MODEL:{'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 52, 'epochs': 10000, 'learning_rate': 0.0003, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 24, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'use_mel_posterior_encoder': False, 'training_files': 'filelists/train.list', 'validation_files': 'filelists/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'whale': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'model_dir': './logs/./OUTPUT_MODEL', 'cont': False}
INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0
INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes.
skipped:  8 , total:  686
skipped:  0 , total:  4
Using noise scaled MAS for VITS2
Using duration discriminator for VITS2
256 2
256 2
256 2
256 2
256 2
./logs/./OUTPUT_MODEL/DUR_0.pth
error, norm_1.gamma is not in the checkpoint
error, norm_1.beta is not in the checkpoint
error, norm_2.gamma is not in the checkpoint
error, norm_2.beta is not in the checkpoint
error, cond.weight is not in the checkpoint
error, cond.bias is not in the checkpoint
load 
INFO:OUTPUT_MODEL:Loaded checkpoint './logs/./OUTPUT_MODEL/DUR_0.pth' (iteration 694)
./logs/./OUTPUT_MODEL/G_0.pth
error, emb_g.weight is not in the checkpoint
load 
INFO:OUTPUT_MODEL:Loaded checkpoint './logs/./OUTPUT_MODEL/G_0.pth' (iteration 0)
./logs/./OUTPUT_MODEL/D_0.pth
PytorchStreamReader failed reading zip archive: failed finding central directory
0it [00:00, ?it/s]
INFO:OUTPUT_MODEL:====> Epoch: 1
/usr/local/lib/python3.8/dist-packages/torch/optim/lr_scheduler.py:139: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
0it [00:00, ?it/s]
INFO:OUTPUT_MODEL:====> Epoch: 2
0it [00:00, ?it/s]
INFO:OUTPUT_MODEL:====> Epoch: 3
0it [00:00, ?it/s]
INFO:OUTPUT_MODEL:====> Epoch: 4
0it [00:00, ?it/s]

试了原项目的DUR_0和release中的DUR_0,均存在这个问题 请问有可能是什么导致的?

jnc-nj commented 1 year ago

看了眼,有可能是这句? PytorchStreamReader failed reading zip archive: failed finding central directory

YYuX-1145 commented 12 months ago

https://github.com/YYuX-1145/Bert-VITS2-Integration-package/issues/11

jnc-nj commented 12 months ago

11

感谢回复 我的样本刚才确认了,全是5-10秒,mono,44.1kHz;样本并没有经过AU,直接走的内置脚本 是否有别的需要检查的点?

jnc-nj commented 12 months ago

观察:train_ms.py 中train_and_evaluate()不知为何train_loader是空的

观察:dataloader为空,但train_dataset长度正常

观察:dataloader空的原因是torch遇到坏掉资料的时候会无声失败,返回空[]

尝试:将audiopath打印出来,删除或处理掉原素材的corrupt文件,然后重跑transcribe

结果:失败,问题不在这,得找别的dataloader返回空的原因

提问:DistributedBucketSampler 中,dataset.lengths中的值不在boundaries中,boundaries是写死的,是否能更改?

尝试:手动修改sampler中boundaries参数为符合dataset.lengths可行

结果:成功

修正:按照dataset.lengths进行自动boundaries匹配(即按照lengths重新进行分布)

修正:构建dataset的时候max_text_len应当符合文本中phoneme总长度上限

YYuX-1145 commented 12 months ago

等等,我才注意到你大概率是云训练 那么更有可能是环境的问题,各种依赖版本不对会导致各种奇奇怪怪的问题。还有requirements.txt我之后基本没动过,没有和原项目同步

jnc-nj commented 12 months ago

不,我是docker化了,运行在自己的实体服务器上; 已经定位并解决问题了,主要两点:

  1. 不同编码下的wav文件对于音素的长度不同,固写死的boundaries无法涵盖,导致dataset在sampler的环节中被清空;解决方案为手动调整boundaries,我这边是直接写了一个自动适配
  2. phoneme长度高于max_text_len,默认300,但中文大篇幅文章普遍都会超过这个限制,上调到3000左右即可

在我的修改下能正常训练任意音频文件并进行推理

GEORGE-Ta commented 12 months ago

很奇怪,报错还是报错,但是生成的模型还能用,效果也不错。

加了个调整boundaries,这样对吗?但是这样之后还是会有最下面的报错。音频44100,单声道,长短什么的也都看过了。


def __init__(self, dataset, batch_size, boundaries=None, num_replicas=None, rank=None, shuffle=True):
    super().__init__(dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle)
    self.lengths = dataset.lengths
    self.batch_size = batch_size

    # 如果没有提供boundary,就计算动态boundary
    if boundaries is None:
        self.boundaries = self.compute_dynamic_boundaries()
    else:
        self.boundaries = boundaries

    self.buckets, self.num_samples_per_bucket = self._create_buckets()
    self.total_size = sum(self.num_samples_per_bucket)
    self.num_samples = self.total_size // self.num_replicas

def compute_dynamic_boundaries(self):

    # 使用数据集中的音频长度计算分位数
    boundaries = list(np.percentile(self.lengths, [10, 20, 30, 40, 50, 60, 70, 80, 90]))

    return boundaries

打印输出Boundaries是[659, 746, 773, 795, 814, 840, 862, 901, 946]

F:\Programming_Project\AI\Voice\0903更新\Bert-VITS2-Integration-Package-0903\Bert-VITS2-Integration-Package>%PYTHON% train_ms.py -c ./configs\config.json Utils.py file is being executed Utils.py file is being executed INFO:OUTPUT_MODEL:{'train': {'log_interval': 10, 'eval_interval': 100, 'seed': 52, 'epochs': 1000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'use_mel_posterior_encoder': False, 'training_files': 'filelists/train.list', 'validation_files': 'filelists/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'spring': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'model_dir': './logs\./OUTPUT_MODEL', 'cont': False} WARNING:OUTPUT_MODEL:F:\Programming_Project\AI\Voice\0903更新\Bert-VITS2-Integration-Package-0903\Bert-VITS2-Integration-Package is not a git repository, therefore hash value comparison will be ignored. INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. skipped: 0 , total: 602 skipped: 0 , total: 4 Using noise scaled MAS for VITS2 Using duration discriminator for VITS2 256 2 256 2 256 2 256 2 256 2 ./logs./OUTPUT_MODEL\DUR_0.pth 中文中文Checkpoint path: ./logs./OUTPUT_MODEL\DUR_0.pth error, norm_1.gamma is not in the checkpoint error, norm_1.beta is not in the checkpoint error, norm_2.gamma is not in the checkpoint error, norm_2.beta is not in the checkpoint error, cond.weight is not in the checkpoint error, cond.bias is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\DUR_0.pth' (iteration 694) ./logs./OUTPUT_MODEL\G_0.pth 中文中文Checkpoint path: ./logs./OUTPUT_MODEL\G_0.pth error, emb_g.weight is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\G_0.pth' (iteration 0) ./logs./OUTPUT_MODEL\D_0.pth 中文中文Checkpoint path: ./logs./OUTPUT_MODEL\D_0.pth load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\D_0.pth' (iteration 0) Utils.py file is being executed Utils.py file is being executed

YYuX-1145 commented 12 months ago

很奇怪,报错还是报错,但是生成的模型还能用,效果也不错。

加了个调整boundaries,这样对吗?但是这样之后还是会有最下面的报错。音频44100,单声道,长短什么的也都看过了。

def __init__(self, dataset, batch_size, boundaries=None, num_replicas=None, rank=None, shuffle=True):
    super().__init__(dataset, num_replicas=num_replicas, rank=rank, shuffle=shuffle)
    self.lengths = dataset.lengths
    self.batch_size = batch_size

    # 如果没有提供boundary,就计算动态boundary
    if boundaries is None:
        self.boundaries = self.compute_dynamic_boundaries()
    else:
        self.boundaries = boundaries

    self.buckets, self.num_samples_per_bucket = self._create_buckets()
    self.total_size = sum(self.num_samples_per_bucket)
    self.num_samples = self.total_size // self.num_replicas

def compute_dynamic_boundaries(self):

    # 使用数据集中的音频长度计算分位数
    boundaries = list(np.percentile(self.lengths, [10, 20, 30, 40, 50, 60, 70, 80, 90]))

    return boundaries

打印输出Boundaries是[659, 746, 773, 795, 814, 840, 862, 901, 946]

F:\Programming_Project\AI\Voice\0903更新\Bert-VITS2-Integration-Package-0903\Bert-VITS2-Integration-Package>%PYTHON% train_ms.py -c ./configs\config.json Utils.py file is being executed Utils.py file is being executed INFO:OUTPUT_MODEL:{'train': {'log_interval': 10, 'eval_interval': 100, 'seed': 52, 'epochs': 1000, 'learning_rate': 0.0002, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 16, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'use_mel_posterior_encoder': False, 'training_files': 'filelists/train.list', 'validation_files': 'filelists/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'spring': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'model_dir': './logs./OUTPUT_MODEL', 'cont': False} WARNING:OUTPUT_MODEL:F:\Programming_Project\AI\Voice\0903更新\Bert-VITS2-Integration-Package-0903\Bert-VITS2-Integration-Package is not a git repository, therefore hash value comparison will be ignored. INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. skipped: 0 , total: 602 skipped: 0 , total: 4 Using noise scaled MAS for VITS2 Using duration discriminator for VITS2 256 2 256 2 256 2 256 2 256 2 ./logs./OUTPUT_MODEL\DUR_0.pth 中文中文Checkpoint path: ./logs./OUTPUT_MODEL\DUR_0.pth error, norm_1.gamma is not in the checkpoint error, norm_1.beta is not in the checkpoint error, norm_2.gamma is not in the checkpoint error, norm_2.beta is not in the checkpoint error, cond.weight is not in the checkpoint error, cond.bias is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\DUR_0.pth' (iteration 694) ./logs./OUTPUT_MODEL\G_0.pth 中文中文Checkpoint path: ./logs./OUTPUT_MODEL\G_0.pth error, emb_g.weight is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\G_0.pth' (iteration 0) ./logs./OUTPUT_MODEL\D_0.pth 中文中文Checkpoint path: ./logs./OUTPUT_MODEL\D_0.pth load INFO:OUTPUT_MODEL:Loaded checkpoint './logs./OUTPUT_MODEL\D_0.pth' (iteration 0) Utils.py file is being executed Utils.py file is being executed

上面几个error加载底膜的时候是正常的,只要不跑空就没问题

GEORGE-Ta commented 12 months ago

哦好的,谢谢谢谢

jnc-nj commented 12 months ago

@YYuX-1145 感觉这个修改应该整合到库中,这样才能最大程度避免普通用户因为音频文件不合标导致的错误 @GEORGE-Ta 的boundaries分配应该是OK的,唯一可能需要考虑的是按照percentile做分配有可能会导致单个bucket内容过大,爆显存

YYuX-1145 commented 12 months ago

@YYuX-1145 感觉这个修改应该整合到库中,这样才能最大程度避免普通用户因为音频文件不合标导致的错误 @GEORGE-Ta 的boundaries分配应该是OK的,唯一可能需要考虑的是按照percentile做分配有可能会导致单个bucket内容过大,爆显存

测试下来我把原神语音删的只剩下个位数的情况下确实会报这个错。但是修改的话你应该向原项目提交代码更改。另外这段代码和finetune是一样的,我不能确定修改会不会产生其他影响。

qin-tain commented 11 months ago

@YYuX-1145 @jnc-nj 大佬们考虑建一个分支修改一下吧,在google colab上跑不通。可不可以弄一个google colab能跑通的,没有N卡没办法呀。 音频文件的那些特征小白玩家搞不清楚。

下面是 google colab 训练这一步的报错信息,也可能是我在别的步骤出问题了:

2023-09-25 13:07:51.069944: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-09-25 13:07:52.165830: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-09-25 13:08:07.856653: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT INFO:OUTPUT_MODEL:{'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 52, 'epochs': 10000, 'learning_rate': 0.0003, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 24, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'use_mel_posterior_encoder': False, 'training_files': 'filelists/train.list', 'validation_files': 'filelists/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'drowranger': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'model_dir': './logs/./OUTPUT_MODEL', 'cont': False} INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. skipped: 0 , total: 83 skipped: 0 , total: 4 Using noise scaled MAS for VITS2 Using duration discriminator for VITS2 256 2 256 2 256 2 256 2 256 2 ./logs/./OUTPUT_MODEL/DUR_0.pth error, norm_1.gamma is not in the checkpoint error, norm_1.beta is not in the checkpoint error, norm_2.gamma is not in the checkpoint error, norm_2.beta is not in the checkpoint error, cond.weight is not in the checkpoint error, cond.bias is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs/./OUTPUT_MODEL/DUR_0.pth' (iteration 694) ./logs/./OUTPUT_MODEL/G_0.pth error, emb_g.weight is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs/./OUTPUT_MODEL/G_0.pth' (iteration 0) ./logs/./OUTPUT_MODEL/D_0.pth load INFO:OUTPUT_MODEL:Loaded checkpoint './logs/./OUTPUT_MODEL/D_0.pth' (iteration 0) 0it [00:00, ?it/s]2023-09-25 13:08:21.689617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-09-25 13:08:21.689617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /content/Bert-VITS2-YYuX-1145/mel_processing.py:78: FutureWarning: Pass sr=44100, n_fft=2048, n_mels=128, fmin=0.0, fmax=None as keyword args. From version 0.10 passing these as positional arguments will result in an error mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax) /usr/local/lib/python3.10/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] [W reducer.cpp:1300] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 0it [00:44, ?it/s] Traceback (most recent call last): File "/content/Bert-VITS2-YYuX-1145/train_ms.py", line 402, in main() File "/content/Bert-VITS2-YYuX-1145/train_ms.py", line 60, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/content/Bert-VITS2-YYuX-1145/train_ms.py", line 193, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], [scheduler_g, scheduler_d, scheduler_dur_disc], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/content/Bert-VITS2-YYuX-1145/train_ms.py", line 286, in train_and_evaluate loss_fm = feature_loss(fmap_r, fmap_g) File "/content/Bert-VITS2-YYuX-1145/losses.py", line 13, in feature_loss loss += torch.mean(torch.abs(rl - gl)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 14.75 GiB total capacity; 13.14 GiB already allocated; 14.81 MiB free; 13.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

YYuX-1145 commented 11 months ago

@YYuX-1145 @jnc-nj 大佬们考虑建一个分支修改一下吧,在google colab上跑不通。可不可以弄一个google colab能跑通的,没有N卡没办法呀。 音频文件的那些特征小白玩家搞不清楚。

下面是 google colab 训练这一步的报错信息,也可能是我在别的步骤出问题了:

2023-09-25 13:07:51.069944: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations. To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-09-25 13:07:52.165830: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-09-25 13:08:07.856653: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT INFO:OUTPUT_MODEL:{'train': {'log_interval': 200, 'eval_interval': 1000, 'seed': 52, 'epochs': 10000, 'learning_rate': 0.0003, 'betas': [0.8, 0.99], 'eps': 1e-09, 'batch_size': 24, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 16384, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'use_mel_posterior_encoder': False, 'training_files': 'filelists/train.list', 'validation_files': 'filelists/val.list', 'max_wav_value': 32768.0, 'sampling_rate': 44100, 'filter_length': 2048, 'hop_length': 512, 'win_length': 2048, 'n_mel_channels': 128, 'mel_fmin': 0.0, 'mel_fmax': None, 'add_blank': True, 'n_speakers': 1, 'cleaned_text': True, 'spk2id': {'drowranger': 0}}, 'model': {'use_spk_conditioned_encoder': True, 'use_noise_scaled_mas': True, 'use_mel_posterior_encoder': False, 'use_duration_discriminator': True, 'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0.1, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [8, 8, 2, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 8, 2, 2], 'n_layers_q': 3, 'use_spectral_norm': False, 'gin_channels': 256}, 'model_dir': './logs/./OUTPUT_MODEL', 'cont': False} INFO:torch.distributed.distributed_c10d:Added key: store_based_barrier_key:1 to store for rank: 0 INFO:torch.distributed.distributed_c10d:Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 1 nodes. skipped: 0 , total: 83 skipped: 0 , total: 4 Using noise scaled MAS for VITS2 Using duration discriminator for VITS2 256 2 256 2 256 2 256 2 256 2 ./logs/./OUTPUT_MODEL/DUR_0.pth error, norm_1.gamma is not in the checkpoint error, norm_1.beta is not in the checkpoint error, norm_2.gamma is not in the checkpoint error, norm_2.beta is not in the checkpoint error, cond.weight is not in the checkpoint error, cond.bias is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs/./OUTPUT_MODEL/DUR_0.pth' (iteration 694) ./logs/./OUTPUT_MODEL/G_0.pth error, emb_g.weight is not in the checkpoint load INFO:OUTPUT_MODEL:Loaded checkpoint './logs/./OUTPUT_MODEL/G_0.pth' (iteration 0) ./logs/./OUTPUT_MODEL/D_0.pth load INFO:OUTPUT_MODEL:Loaded checkpoint './logs/./OUTPUT_MODEL/D_0.pth' (iteration 0) 0it [00:00, ?it/s]2023-09-25 13:08:21.689617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT 2023-09-25 13:08:21.689617: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /usr/local/lib/python3.10/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] /content/Bert-VITS2-YYuX-1145/mel_processing.py:78: FutureWarning: Pass sr=44100, n_fft=2048, n_mels=128, fmin=0.0, fmax=None as keyword args. From version 0.10 passing these as positional arguments will result in an error mel = librosa_mel_fn(sampling_rate, n_fft, num_mels, fmin, fmax) /usr/local/lib/python3.10/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] [W reducer.cpp:1300] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator()) 0it [00:44, ?it/s] Traceback (most recent call last): File "/content/Bert-VITS2-YYuX-1145/train_ms.py", line 402, in main() File "/content/Bert-VITS2-YYuX-1145/train_ms.py", line 60, in main mp.spawn(run, nprocs=n_gpus, args=(n_gpus, hps,)) File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 239, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 197, in start_processes while not context.join(): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 160, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException: -- Process 0 terminated with the following error: Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/multiprocessing/spawn.py", line 69, in _wrap fn(i, *args) File "/content/Bert-VITS2-YYuX-1145/train_ms.py", line 193, in run train_and_evaluate(rank, epoch, hps, [net_g, net_d, net_dur_disc], [optim_g, optim_d, optim_dur_disc], [scheduler_g, scheduler_d, scheduler_dur_disc], scaler, [train_loader, eval_loader], logger, [writer, writer_eval]) File "/content/Bert-VITS2-YYuX-1145/train_ms.py", line 286, in train_and_evaluate loss_fm = feature_loss(fmap_r, fmap_g) File "/content/Bert-VITS2-YYuX-1145/losses.py", line 13, in feature_loss loss += torch.mean(torch.abs(rl - gl)) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 14.75 GiB total capacity; 13.14 GiB already allocated; 14.81 MiB free; 13.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

很明显是爆显存了。不过我不会也不打算做适配云训练的版本。

qin-tain commented 11 months ago

@YYuX-1145 后面是爆显存了,前面DUR_0.pth那里也有和标题类似的报错。我以为14G显存应该够用了,因为音频也都不长,就几秒一个的那种。再去调整下试试吧。

qin-tain commented 11 months ago

@YYuX-1145 后面是爆显存了,前面DUR_0.pth那里也有和标题类似的报错。我以为14G显存应该够用了,因为音频也都不长,就几秒一个的那种。再去调整下试试吧。

成功的跑起来了,可能是 config.json 里的 batch 太大了。因为我是直接从仓库里拉取的,里面 batch 是 24,而整合包里是 16,我改成 16 就跑起来了。因为新手小白没有经验,之前一直看代码里的设置,没有注意看config的设置。