PaddlePaddle / PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
https://paddlespeech.readthedocs.io
Apache License 2.0
10.99k stars 1.83k forks source link

[TTS]examples/aishell3/vits-vc训练的模型为啥效果不好?啥时候出训练模型结果文件? #2483

Closed liukaiyueyuo closed 2 years ago

liukaiyueyuo commented 2 years ago

目录examples/aishell3/vits-vc训练的模型为啥效果不好?V100单卡GPU训练了3天,mel_loss下不来一直在26左右徘徊。可以给出你们paddlespeech训练好的模型文件瞅瞅么? 日志示例: INFO 2022-09-20 10:58:28,841 trainer.py:167] iter: 74635/350000, Rank: 0, real_loss: 1.203158, fake_loss: 1.163633, discriminator_loss: 2.366791, g enerator_loss: 35.696995, generator_mel_loss: 25.517664, generator_kl_loss: 1.996764, generator_dur_loss: 2.306179, generator_adv_loss: 2.221788, ge nerator_feat_match_loss: 3.654603, avg_reader_cost: 0.00034 sec, avg_batch_cost: 2.39460 sec, avg_samples: 50, avg_ips: 20.88035 sequences/sec INFO 2022-09-20 10:58:31,169 trainer.py:167] iter: 74636/350000, Rank: 0, real_loss: 1.248867, fake_loss: 1.126140, discriminator_loss: 2.375007, g enerator_loss: 36.339199, generator_mel_loss: 26.522982, generator_kl_loss: 1.727793, generator_dur_loss: 2.320811, generator_adv_loss: 2.271485, ge nerator_feat_match_loss: 3.496129, avg_reader_cost: 0.00029 sec, avg_batch_cost: 2.31373 sec, avg_samples: 50, avg_ips: 21.61016 sequences/sec INFO 2022-09-20 10:58:33,541 trainer.py:167] iter: 74637/350000, Rank: 0, real_loss: 1.228455, fake_loss: 1.174015, discriminator_loss: 2.402470, g enerator_loss: 36.243534, generator_mel_loss: 26.122320, generator_kl_loss: 2.049680, generator_dur_loss: 2.297820, generator_adv_loss: 2.229776, ge nerator_feat_match_loss: 3.543939, avg_reader_cost: 0.00036 sec, avg_batch_cost: 2.35778 sec, avg_samples: 50, avg_ips: 21.20636 sequences/sec INFO 2022-09-20 10:58:35,905 trainer.py:167] iter: 74638/350000, Rank: 0, real_loss: 1.233365, fake_loss: 1.132419, discriminator_loss: 2.365783, g enerator_loss: 36.313385, generator_mel_loss: 26.273182, generator_kl_loss: 1.906548, generator_dur_loss: 2.294973, generator_adv_loss: 2.265149, ge nerator_feat_match_loss: 3.573530, avg_reader_cost: 0.00030 sec, avg_batch_cost: 2.35022 sec, avg_samples: 50, avg_ips: 21.27463 sequences/sec INFO 2022-09-20 10:58:38,317 trainer.py:167] iter: 74639/350000, Rank: 0, real_loss: 1.166814, fake_loss: 1.176862, discriminator_loss: 2.343676, g enerator_loss: 36.301010, generator_mel_loss: 26.083799, generator_kl_loss: 1.860562, generator_dur_loss: 2.323201, generator_adv_loss: 2.231595, ge nerator_feat_match_loss: 3.801853, avg_reader_cost: 0.00038 sec, avg_batch_cost: 2.39805 sec, avg_samples: 50, avg_ips: 20.85030 sequences/sec INFO 2022-09-20 10:58:40,702 trainer.py:167] iter: 74640/350000, Rank: 0, real_loss: 1.215568, fake_loss: 1.201060, discriminator_loss: 2.416629, g enerator_loss: 35.916458, generator_mel_loss: 25.964396, generator_kl_loss: 1.874138, generator_dur_loss: 2.341110, generator_adv_loss: 2.173471, ge nerator_feat_match_loss: 3.563343, avg_reader_cost: 0.00039 sec, avg_batch_cost: 2.37029 sec, avg_samples: 50, avg_ips: 21.09451 sequences/sec INFO 2022-09-20 10:58:43,096 trainer.py:167] iter: 74641/350000, Rank: 0, real_loss: 1.245020, fake_loss: 1.108281, discriminator_loss: 2.353301, g enerator_loss: 36.572319, generator_mel_loss: 26.206434, generator_kl_loss: 2.026091, generator_dur_loss: 2.339233, generator_adv_loss: 2.298341, ge nerator_feat_match_loss: 3.702222, avg_reader_cost: 0.00032 sec, avg_batch_cost: 2.37950 sec, avg_samples: 50, avg_ips: 21.01280 sequences/sec INFO 2022-09-20 10:58:45,468 trainer.py:167] iter: 74642/350000, Rank: 0, real_loss: 1.271199, fake_loss: 1.184860, discriminator_loss: 2.456059, g enerator_loss: 35.333931, generator_mel_loss: 25.618114, generator_kl_loss: 1.949838, generator_dur_loss: 2.322937, generator_adv_loss: 2.159736, ge nerator_feat_match_loss: 3.283306, avg_reader_cost: 0.00031 sec, avg_batch_cost: 2.35818 sec, avg_samples: 50, avg_ips: 21.20278 sequences/sec

yt605155624 commented 2 years ago

抱歉,vits-vc 是开发者提供的代码,我们暂时没有计划训练该模型,也不清楚该模型的效果 但是据我所知,VITS 很难训练,我训练 csmsc 上单说话人的 VITS 时需要 V100 4 卡 2 星期,V100单卡GPU训练了3天 肯定是出不来的

liukaiyueyuo commented 2 years ago

好的,谢谢!