VRCWizard / TTS-Voice-Wizard

Speech to Text to Speech. Song now playing. Sends text as OSC messages to VRChat to display on avatar. (STTTS) (Speech to TTS) (VRC STT System) (VTuber TTS)
https://TTSVoiceWizard.com
MIT License
579 stars 66 forks source link

Issue about Whisper Model recognition Chinese #36

Closed XieLongWu closed 1 year ago

XieLongWu commented 1 year ago

I have tested almost all whisper models of all sizes, and similar things happen. When the recognition input language is set to Chinese. When there is no voice input, the whisper model will continue to output spam, but if it is set to English, there will be no similar issue. Japanese has not found the same issue, and I have not tried other languages. Then I tried to switch the model to vosk, and there was no similar issue.

When I set the input to a microphone without any signal input, as can be seen in the screenshot, the whisper models output garbage. TTSVoiceWizard_2023-06-21_13-06-43

This issue occurs more often when the mic is set to the one that I everyday use at the time I don't say anything

I don't know what caused this problem. I tried to use text replacement to delete these spam messages, but he often randomly combined some common words and randomly added spaces, which made it basically unusable.

I checked Whisper Model's github, as well as some technology shares using Whisper Model directly. But there seems to be no such problem. It seems that Whisper Model can be set to recognize multiple languages at the same time. Is it possible to manually add startup commands, or provide an option to select multiple languages? I wonder if this problem will be eliminated when multiple languages are recognized at the same time.

VRCWizard commented 1 year ago

This is an issue with the package I use https://github.com/Const-me/Whisper/issues/54 , where it hallucinates words where there are non. I'll be looking into adding a similar solution as proposed in this issue.

XieLongWu commented 1 year ago

This is an issue with the package I use Const-me/Whisper#54 , where it hallucinates words where there are non. I'll be looking into adding a similar solution as proposed in this issue.

Oh I see, turns out this is indeed an issue with whisper, I'll close this issue and try to post an issue on whisper, thanks for your help.