Open SuperiorDtj opened 5 months ago
Hi, in the inference code, speecht5 is loaded initially with public weights, but the parameters are overwritten again with state_dict from the VoiceLDM checkpoint.
Thanks for your quick reply!
Hi, in the inference code, speecht5 is loaded initially with public weights, but the parameters are overwritten again with state_dict from the VoiceLDM checkpoint.
I have another question, if you don't mind answering. Can using a regular phoneme sequence embedding network instead of SpeechT5 achieve the same effect? In other words, is SpeechT5 necessary for modeling duration information? Or can a regular nn.embedder + Durator achieve similar results?
No, using SpeechT5 isn't strictly necessary, any form of 'text encoder' would likely do the job. Also, regarding using a single nn.embedder before Durator, I believe it's possible, but the linguistic modeling performance would likely be quite poor.
No, using SpeechT5 isn't strictly necessary, any form of 'text encoder' would likely do the job. Also, regarding using a single nn.embedder before Durator, I believe it's possible, but the linguistic modeling performance would likely be quite poor. Thanks for your advice! It's very helpful for my research!
No, using SpeechT5 isn't strictly necessary, any form of 'text encoder' would likely do the job. Also, regarding using a single nn.embedder before Durator, I believe it's possible, but the linguistic modeling performance would likely be quite poor.
Have you tried freezing the parameters of SpeechT5? Or, is it necessary to update the text encoder parameters in this TTS modeling approach?
I have tried both, and found that updating the text encoder's parameters led to better performance.
I have tried both, and found that updating the text encoder's parameters led to better performance.
Thanks for your reply! It's very helpful for my research!
I found that in I found that in the training code, speecht5 can be trained. However, in the inference code, speecht5 is loaded with Microsoft's public weights. Could you please clarify whether training speecht5 affects the results?