Open k0ngolab opened 9 months ago
Hi,
I am at the moment in the process of removing wav2vec with better solution to support other languages. If it works, will add a new model with new mapping and beside training code soon. Otherwise I will update the repo with the wav2vec mapping.
I tried retraining the model and syncnet with the latest version of deepspeech but this didn't lead to nice results compared to using the originally trained model. The generalization and the expressivity of the lips motion were not very convincing. An alternative solution would be training a mapping model fro the latest version of deepspeech to the original version used with DINet. This would keep the same trained model of DINet, beside keeping the inference fast as the latest version of deepspeech supports GPU and onnx. Didn't have time to test it yet but feel free to give it a try and open a PR.
I tried retraining the model and syncnet with the latest version of deepspeech but this didn't lead to nice results compared to using the originally trained model. The generalization and the expressivity of the lips motion were not very convincing. An alternative solution would be training a mapping model fro the latest version of deepspeech to the original version used with DINet. This would keep the same trained model of DINet, beside keeping the inference fast as the latest version of deepspeech supports GPU and onnx. Didn't have time to test it yet but feel free to give it a try and open a PR.
请问你后面使用的是哪个版本的 deepspeech,训练过程中维度不一致的问题是怎么解决的呢,谢谢
May I ask which version of deepspeech you are using later, and how to solve the problem of inconsistent dimensions during the training process? Thank you.
I was using 0.9.1 and the dimensions issue is raised mainly from other languages, like Chinese. I tried learn mapping this obtained features to the expected dimensions but this didn't always work good. Furthermore, deepspeech seems to cause many problems with many different languages and that's why I am trying to rely mainly on melspectrograms at the moment.
Hello,
Could you please add the code to train wav2vec mapping in deepspeech?
Thank you.