老师您好，请问您迭代了多少iteration？

liumingda commented 1 year ago

1、老师您好，配置文件我只将16000换为48000，其余均没有改变，目前训练了730000个iteration, 推理结果没有您给的例子效果好，想知道您例子的模型训练了多少iteration？

2、还有一个问题就是我加入了自己的训练集，需要在text.symbols中_tones = ["1", "2", "3", "4", "5"]变为_tones = ["1", "2", "3", "4", "5", "6"]，但这样训练就会报这个错： expavg.mul(beta1).add_(grad, alpha=1 - beta1) RuntimeError: The size of tensor a (219) must match the size of tensor b (257) at non-singleton dimension 0 其中exp_avg_shape: torch.Size([219, 192]), grad_shape: torch.Size([257, 192])，就不知道为什么exp_avg的大小没有对应改过来，谢谢老师！

MaxMax2016 commented 1 year ago

1，时间有点久了、记不太清了；好像是24 batch_size & 500K iteration；具体还是要看kl_loss和mel_loss确定是不是训练时间不够；另外，采样率调大后，segment_size也应该适当调大。 2，改text.symbols等于改模型，需要从头训练模型；exp_avg_shape我没定位到对应的代码，有更详细的信息吗？

liumingda commented 1 year ago

谢谢老师，exp_avg_shape对应的代码在torch>optim>_functional.py下的adamw函数下，您能帮忙看下吗？

MaxMax2016 commented 1 year ago

把完整的错误信息贴出来吧

liumingda commented 1 year ago

Traceback (most recent call last): File "train.py", line 438, in main() File "train.py", line 43, in main mp.spawn( File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/home/notebook/code/personal/S9052934/vits/tts/vits_chinese-master/train.py", line 159, in run train_and_evaluate( File "/home/notebook/code/personal/S9052934/vits/tts/vits_chinese-master/train.py", line 277, in train_and_evaluate scaler.step(optim_g) File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 310, in step return optimizer.step(args, kwargs) File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper return wrapped(*args, *kwargs) File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/optim/optimizer.py", line 88, in wrapper return func(args, kwargs) File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context return func(*args, **kwargs) File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/optim/adamw.py", line 137, in step F.adamw(params_with_grad, File "/opt/conda/envs/diffsinger/lib/python3.8/site-packages/torch/optim/_functional.py", line 131, in adamw expavg.mul(beta1).add_(grad, alpha=1 - beta1) RuntimeError: The size of tensor a (219) must match the size of tensor b (257) at non-singleton dimension 0 老师，就是完整的错误信息啦，麻烦您啦！

MaxMax2016 commented 1 year ago

网络结构变了要从头训练？

liumingda commented 1 year ago

哦哦，但是网络结构我没有改变呀

MaxMax2016 commented 1 year ago

net_g = utils.load_class(hps.train.train_class)( len(symbols), hps.data.filter_length // 2 + 1, hps.train.segment_size // hps.data.hop_length, **hps.model, ).cuda(rank)

self.enc_p = TextEncoder( n_vocab, inter_channels, hidden_channels, filter_channels, n_heads, n_layers, kernel_size, p_dropout, )

self.emb = nn.Embedding(n_vocab, hidden_channels)

liumingda commented 1 year ago

哦哦，感谢老师，我研究研究！

godspirit00 commented 1 year ago

具体还是要看kl_loss和mel_loss确定是不是训练时间不够

@MaxMax2016 能否请问一下，如何确定训练时间够不够，还是已经过了？

MaxMax2016 commented 1 year ago

loss曲线平了，表示训练够了

PlayVoice / vits_chinese

老师您好，请问您迭代了多少iteration？ #87