alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.36k stars 1.04k forks source link

Japanese model with GPU #937

Closed raghavendrajain closed 2 years ago

raghavendrajain commented 2 years ago

Hi, how do I use the Japanese model with GPU? It seems only a large EN model can be used with GPU support as of now. Please help, thanks!

nshmyrev commented 2 years ago

We have big Japanese model too, you can contact us and describe your project if you are interested.

raghavendrajain commented 2 years ago

@nshmyrev Thank you very much! I am developing an application for the users to learn public speaking. The user would record audio of his speech, which would be analyzed by AI. This requires speech-to-text and currently, I have been using WebSpeech API. My focus is to use only open-source tools because students hardly have money to spend. However, WebSpeech API does not give timestamps and runs on browsers only, so I wish I can use VOSK API. Now, that we are able to use it on GPU, I am pretty excited. Please give me a big Japanese model, that would really help me a lot! Thank you.

nshmyrev commented 2 years ago

You need to mail contact@alphacephei.com to get the model.

raghavendrajain commented 2 years ago

You need to mail contact@alphacephei.com to get the model.

I have sent an email, thanks a lot!

nshmyrev commented 2 years ago

Also #585

n-99 commented 2 years ago

Is there a reason the big model is not available for everyone?I don't have a special project, I just want to use it in Subtitle Edit, and the small model is not very helpful.

nshmyrev commented 1 year ago

@n-99 we have just released big model for Japanese

https://alphacephei.com/vosk/models/vosk-model-ja-0.22.zip

Let us know how it feels

n-99 commented 1 year ago

@nshmyrev Thanks! I just saw it. It's noticeably more accurate, but there's still a long way to go. I'm pretty sure all Japanese models have that problem, since Google's YouTube subs perform very badly on Japanese audio, too. Don't know if it's a feature of the language (lots of homophones due to fewer syllables) or the general focus (more resources for English, Spanish, etc.). Eagerly awaiting the date where those things work near flawlessly, so many unsubbed movies, shows and videos.

tuan-jason commented 3 weeks ago

@n-99 we have just released big model for Japanese

https://alphacephei.com/vosk/models/vosk-model-ja-0.22.zip

Let us know how it feels

Hi, @nshmyrev . We're having the similar issue here when using vosk-android SDK with the small model: https://alphacephei.com/vosk/models/vosk-model-small-ja-0.22.zip

The above big model has the size of nearly 1GB, hence, inappropriate to be integrated into a mobile app.

Would that be possible for you to help releasing another small model for Japanese, including the improvement to fix the above mentioned issue please?

Thanks a lot.

nshmyrev commented 3 weeks ago

@tuan-jason What issue do you have please? The ticket is about GPU, is it relevant for Android? Let me know the details, we can try tof ix.

tuan-jason commented 3 weeks ago

@tuan-jason What issue do you have please? The ticket is about GPU, is it relevant for Android? Let me know the details, we can try tof ix.

@nshmyrev The description of our issue is as below:

I failure to record my voice in app while talking
I just say "こんにちはKonnichiwa", and it's recognized as 今日は by vosk STT( Which means "today is…(Kyou ha)" .)

"今日は" doesn't mean "こんにちは"
("今日は" can be pronounced "Konnichiwa", but usually pronounced "Kyouwa". 

So this is typical failure of Japanese STT)
It's expected that the words output is "こんにちは"

The problem seems to be that inappropriate hiragana-to-kanji conversion is taking place. However, the vosk model does not output hiragana and then convert it to kanji, but outputs kanji from the beginning.

Here's a video of the issue:

https://github.com/alphacep/vosk-api/assets/168809162/a986527c-4e7c-4b4d-94aa-2dcb848c8d6d

Hope the explanation of the issue is clear to you.

tuan-jason commented 3 weeks ago

@nshmyrev Oops, sorry it's my bad. I replied on a wrong thread. The original issue I was referring to is this one: https://github.com/alphacep/vosk-api/issues/1047.

I moved my comment to the correct thread now. Could you please take a look?