Can the Japanese vosk model be modified to return kana-only results?

alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node

Apache License 2.0

7.93k stars 1.1k forks source link

Can the Japanese vosk model be modified to return kana-only results? #1047

Closed coastal45 closed 2 years ago

coastal45 commented 2 years ago

I don't know the inner workings of vosk or the language models, but for Japanese I think the logical process would be audio --> kana (phonetic representation) --> kanji character(s).

While kanji character selection seems decent in very clear speech, in not so clear speech the selection is rather poor. In those cases, I think returning a strictly phonetic representation might be more helpful, and allow the user to take it from there.

So, would it be possible to have a modified version of vosk-model-small-ja-0.22 that would return kana only results?

nshmyrev commented 2 years ago

and allow the user to take it from there

What do you plan then to "take it from there"?

coastal45 commented 2 years ago

I mean to allow the user to choose the correct kanji character(s) based on the returned kana characters. I think automatic kanji selection can be incorrect, especially if any phoneme is incorrectly judged. Then working backwards to figure what the model "heard" is more difficult that choosing the needed kanji myself. And it's easier to spot any incorrect phoneme. So after returning all kana results from that language model, further operations are performed by the user.

coastal45 commented 2 years ago

For example: Audio sampling returns to-ku-be-tsu kana conversion returns とくべつ kanji conversion returns 特別 (special) As it is, the correct kanji will be returned.

But let's say the audio quality is not so good and then audio sampling returns to-ki-be-tsu kana conversion returns ときべつ kanji conversion returns 時別 For each kanji it's ok, とき=時 and べつ=別, but the combination is meaningless.

So in the case of kana only results, I can tell that ときべつ should be とくべつ, either by similarity or context. Going backwards from the kanji would be more difficult, as individual kanji can have different pronunciations/meaning depending on usage.

nshmyrev commented 2 years ago

Ok, sure, we can create such model but it will take a bit of time to create corresponding LM. Like you said there is no direct mapping from kanji to kana so we need to somehow map properly while training the LM. Maybe mecab can do that, but not very accurately.

If you can create kana-only 3gram LM, I'll recompile the graph for you using this LM.

coastal45 commented 2 years ago

I see. I wasn't thinking this way, but I guess the LM training would have to use kanji/kana mixed text samples as it's normally written. Since kana only text/audio examples are unlikely, I suppose training by kana only would not be possible. It's the reverse from writing Japanese, which may have confused me.

Is my reasoning correct? If so, I'll need to come up with another idea to deal with my issue.

nshmyrev commented 2 years ago

@coastal45 we are about to release big Japanese model coming days. It will be much more accurate even with Kanji.

nshmyrev commented 2 years ago

We have just release big model for Japanese

https://alphacephei.com/vosk/models/vosk-model-ja-0.22.zip

coastal45 commented 2 years ago

Thanks, I see it. I am using vosk intergrated with SubtitleEdit. I'll try it out as soon as they add it to the selection menu.

coastal45 commented 2 years ago

Impressive is all I can say. Accuracy is far greater using the big model. Kanji placement is quite good. This issue can be closed as it is no longer relevant.

nshmyrev commented 2 years ago

Great, thanks for testing.

tuan-jason commented 4 months ago

Hi, @nshmyrev . We're having the similar issue here when using vosk-android SDK with the small model: https://alphacephei.com/vosk/models/vosk-model-small-ja-0.22.zip

The above big model has the size of nearly 1GB, hence, inappropriate to be integrated into a mobile app.

Would that be possible for you to help releasing another small model for Japanese, including the improvement to fix the above mentioned issue please?

More details about our issue could be found here: https://github.com/alphacep/vosk-api/issues/937#issuecomment-2162002516

Thanks a lot.

tuan-jason commented 3 months ago

@nshmyrev Sorry if this mention has spammed you. I just want to make sure you didn't missed my above comment. Could you take a look into my issue when you have some free time? I really appreciate it.

nshmyrev commented 3 months ago

@tuan-jason please create a new issue about your problem and add a description