Closed raghavendrajain closed 2 years ago
We have big Japanese model too, you can contact us and describe your project if you are interested.
@nshmyrev Thank you very much! I am developing an application for the users to learn public speaking. The user would record audio of his speech, which would be analyzed by AI. This requires speech-to-text and currently, I have been using WebSpeech API. My focus is to use only open-source tools because students hardly have money to spend. However, WebSpeech API does not give timestamps and runs on browsers only, so I wish I can use VOSK API. Now, that we are able to use it on GPU, I am pretty excited. Please give me a big Japanese model, that would really help me a lot! Thank you.
You need to mail contact@alphacephei.com to get the model.
You need to mail contact@alphacephei.com to get the model.
I have sent an email, thanks a lot!
Also #585
Is there a reason the big model is not available for everyone?I don't have a special project, I just want to use it in Subtitle Edit, and the small model is not very helpful.
@n-99 we have just released big model for Japanese
https://alphacephei.com/vosk/models/vosk-model-ja-0.22.zip
Let us know how it feels
@nshmyrev Thanks! I just saw it. It's noticeably more accurate, but there's still a long way to go. I'm pretty sure all Japanese models have that problem, since Google's YouTube subs perform very badly on Japanese audio, too. Don't know if it's a feature of the language (lots of homophones due to fewer syllables) or the general focus (more resources for English, Spanish, etc.). Eagerly awaiting the date where those things work near flawlessly, so many unsubbed movies, shows and videos.
@n-99 we have just released big model for Japanese
https://alphacephei.com/vosk/models/vosk-model-ja-0.22.zip
Let us know how it feels
Hi, @nshmyrev .
We're having the similar issue here when using vosk-android
SDK with the small model: https://alphacephei.com/vosk/models/vosk-model-small-ja-0.22.zip
The above big model has the size of nearly 1GB, hence, inappropriate to be integrated into a mobile app.
Would that be possible for you to help releasing another small model for Japanese, including the improvement to fix the above mentioned issue please?
Thanks a lot.
@tuan-jason What issue do you have please? The ticket is about GPU, is it relevant for Android? Let me know the details, we can try tof ix.
@tuan-jason What issue do you have please? The ticket is about GPU, is it relevant for Android? Let me know the details, we can try tof ix.
@nshmyrev The description of our issue is as below:
I failure to record my voice in app while talking
I just say "こんにちはKonnichiwa", and it's recognized as 今日は by vosk STT( Which means "today is…(Kyou ha)" .)
"今日は" doesn't mean "こんにちは"
("今日は" can be pronounced "Konnichiwa", but usually pronounced "Kyouwa".
So this is typical failure of Japanese STT)
It's expected that the words output is "こんにちは"
The problem seems to be that inappropriate hiragana-to-kanji conversion is taking place. However, the vosk model does not output hiragana and then convert it to kanji, but outputs kanji from the beginning.
Here's a video of the issue:
https://github.com/alphacep/vosk-api/assets/168809162/a986527c-4e7c-4b4d-94aa-2dcb848c8d6d
Hope the explanation of the issue is clear to you.
@nshmyrev Oops, sorry it's my bad. I replied on a wrong thread. The original issue I was referring to is this one: https://github.com/alphacep/vosk-api/issues/1047.
I moved my comment to the correct thread now. Could you please take a look?
Hi, how do I use the Japanese model with GPU? It seems only a large EN model can be used with GPU support as of now. Please help, thanks!