Open vincentspeech opened 4 weeks ago
the training config for 17wh is cosyvoice.yaml, you can training from scratch if you have enough data
how to adjust the flow config when training with 16K samples? and, is the x-vector value used in training the flow an average of the speaker sampels? or it is a instance one extracted from the target mel?
感谢开源这个优秀的项目,但我注意到base模型flow-matching好像只开放了SFT的训练配置和recipe,请问 17wh从头训练的训练recipe会开源吗?