Open jason-ni opened 1 month ago
When I use SenseVoiceSmall to ASR on an Japanese news audio, it's found that all English abbreviations are ignored. Bellow are comparrations of SenseVoiceSmall output and whisper.cpp output:
SenseVoice
今回のジャパンモビリティーショでは様々な電気自動車が展示されました 例えば中国のは4つの車輪それぞれに独立したモーターをつけた車を発表
whisper.cpp
[00:00:00.600 --> 00:00:06.540] 今回のジャパンモビリティショーでは様々なEV、電気自動車が展示されました [00:00:06.540 --> 00:00:14.720] 例えば中国のBYDは4つの車輪それぞれに独立したモーターをつけた車を発表
Audio comes from this youtube video: https://www.youtube.com/watch?v=EdvILOSgPzY
Could you clearify is it a missing capability of the model itself or it's bug of decoding tokens from model output?
I tried locally and also on the https://www.modelscope.cn/studios/iic/SenseVoice demo. Results are all the same.
🐛 Bug
When I use SenseVoiceSmall to ASR on an Japanese news audio, it's found that all English abbreviations are ignored. Bellow are comparrations of SenseVoiceSmall output and whisper.cpp output:
SenseVoice
whisper.cpp
To Reproduce
Audio comes from this youtube video: https://www.youtube.com/watch?v=EdvILOSgPzY
Expected behavior
Could you clearify is it a missing capability of the model itself or it's bug of decoding tokens from model output?
Environment
I tried locally and also on the https://www.modelscope.cn/studios/iic/SenseVoice demo. Results are all the same.