Elsaam2y / DINet_optimized

An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.
93 stars 15 forks source link

Whisper instead wav2vec? #8

Closed davidmartinrius closed 9 months ago

davidmartinrius commented 9 months ago

Hello, How are you doing? I have a question for you: is it worth to use whisper instead of wav2vec? This maybe would solve the problem of multilanguage, as whisper detects the language and emissions could be more accurate.

I suppose in terms of performance wav2vec is better, but doesn't solve the language problems at all.

What do you thing about that? Do you think is it a good option?

Thank you!

Elsaam2y commented 9 months ago

Hi @davidmartinrius

I am doing well and hope the same for you. Actually this could be very interesting. Recently I was trying with the latest version of DeepSpeech to replace wav2vec, but I didn't try Whisper. Will have a look at it soon.

Thanks

foxyear-kyumin commented 9 months ago

I’m looking forward to sharing with you the methods for using the latest version of DeepSpeech.

Elsaam2y commented 9 months ago

@qiu8888 if you have tried the latest version and worked fine, please feel free to open a MR.

davidmartinrius commented 9 months ago

I am closing this. Definitely, wav2vec should be enough and has better performance. So, whisper could work as well, but after reading this post I realized that wav2vec is quite good for the task in this project. The post is this. (I am not affiliated to them) https://deepgram.com/learn/benchmarking-top-open-source-speech-models