Elsaam2y / DINet_optimized

An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.
93 stars 15 forks source link

Wav2vec mapping code #12

Open k0ngolab opened 9 months ago

k0ngolab commented 9 months ago

Hello,

Could you please add the code to train wav2vec mapping in deepspeech?

Thank you.

Elsaam2y commented 9 months ago

Hi,

I am at the moment in the process of removing wav2vec with better solution to support other languages. If it works, will add a new model with new mapping and beside training code soon. Otherwise I will update the repo with the wav2vec mapping.

Elsaam2y commented 7 months ago

I tried retraining the model and syncnet with the latest version of deepspeech but this didn't lead to nice results compared to using the originally trained model. The generalization and the expressivity of the lips motion were not very convincing. An alternative solution would be training a mapping model fro the latest version of deepspeech to the original version used with DINet. This would keep the same trained model of DINet, beside keeping the inference fast as the latest version of deepspeech supports GPU and onnx. Didn't have time to test it yet but feel free to give it a try and open a PR.

tailangjun commented 3 months ago

I tried retraining the model and syncnet with the latest version of deepspeech but this didn't lead to nice results compared to using the originally trained model. The generalization and the expressivity of the lips motion were not very convincing. An alternative solution would be training a mapping model fro the latest version of deepspeech to the original version used with DINet. This would keep the same trained model of DINet, beside keeping the inference fast as the latest version of deepspeech supports GPU and onnx. Didn't have time to test it yet but feel free to give it a try and open a PR.

请问你后面使用的是哪个版本的 deepspeech,训练过程中维度不一致的问题是怎么解决的呢,谢谢

May I ask which version of deepspeech you are using later, and how to solve the problem of inconsistent dimensions during the training process? Thank you.

Elsaam2y commented 2 months ago

I was using 0.9.1 and the dimensions issue is raised mainly from other languages, like Chinese. I tried learn mapping this obtained features to the expected dimensions but this didn't always work good. Furthermore, deepspeech seems to cause many problems with many different languages and that's why I am trying to rely mainly on melspectrograms at the moment.