FunAudioLLM / SenseVoice

Multilingual Voice Understanding Model
https://funaudiollm.github.io/
Other
2.61k stars 249 forks source link

SenseVoiceSmall can not recognize English abbreviations in Japanese language speech #81

Open jason-ni opened 1 month ago

jason-ni commented 1 month ago

🐛 Bug

When I use SenseVoiceSmall to ASR on an Japanese news audio, it's found that all English abbreviations are ignored. Bellow are comparrations of SenseVoiceSmall output and whisper.cpp output:

SenseVoice

今回のジャパンモビリティーショでは様々な電気自動車が展示されました 例えば中国のは4つの車輪それぞれに独立したモーターをつけた車を発表

whisper.cpp

[00:00:00.600 --> 00:00:06.540]  今回のジャパンモビリティショーでは様々なEV、電気自動車が展示されました
[00:00:06.540 --> 00:00:14.720]  例えば中国のBYDは4つの車輪それぞれに独立したモーターをつけた車を発表

To Reproduce

Audio comes from this youtube video: https://www.youtube.com/watch?v=EdvILOSgPzY

Expected behavior

Could you clearify is it a missing capability of the model itself or it's bug of decoding tokens from model output?

Environment

I tried locally and also on the https://www.modelscope.cn/studios/iic/SenseVoice demo. Results are all the same.