Elsaam2y / DINet_optimized

An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.
104 stars 17 forks source link

Whisper instead wav2vec? #8

Closed davidmartinrius closed 1 year ago

davidmartinrius commented 1 year ago

Hello, How are you doing? I have a question for you: is it worth to use whisper instead of wav2vec? This maybe would solve the problem of multilanguage, as whisper detects the language and emissions could be more accurate.

I suppose in terms of performance wav2vec is better, but doesn't solve the language problems at all.

What do you thing about that? Do you think is it a good option?

Thank you!

Elsaam2y commented 1 year ago

Hi @davidmartinrius

I am doing well and hope the same for you. Actually this could be very interesting. Recently I was trying with the latest version of DeepSpeech to replace wav2vec, but I didn't try Whisper. Will have a look at it soon.

Thanks

foxyear-kyumin commented 1 year ago

I’m looking forward to sharing with you the methods for using the latest version of DeepSpeech.

Elsaam2y commented 1 year ago

@qiu8888 if you have tried the latest version and worked fine, please feel free to open a MR.

davidmartinrius commented 1 year ago

I am closing this. Definitely, wav2vec should be enough and has better performance. So, whisper could work as well, but after reading this post I realized that wav2vec is quite good for the task in this project. The post is this. (I am not affiliated to them) https://deepgram.com/learn/benchmarking-top-open-source-speech-models