Open theman23290 opened 12 months ago
This issue seems to be related to this issue with Whisper: https://github.com/openai/whisper/discussions/679 TLDR: Implement --condition_on_previous_text and VAD, and the issues go away. Any way to implement that fix into this project?
That's for @Tony-sama to consider.
Check the recent commit. Is that what you asked?
I believe so. The fix still didn't fix the original issue though. I don't know if this is a whisper issue or if it is an issues with how whisper is implemented in this code. Here is the output on the terminal while a client is connected through api.
/home/senpai/miniconda/envs/extras/lib/python3.11/site-packages/whisper/transcribe.py:115: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Hi, I had the same problem and all I did was leave it for a week, reboot it, and it (for whatever reason) worked perfectly after that. I wish I could be more helpful than that, but I had the same problem with my installation of whisper. https://github.com/SillyTavern/SillyTavern-Extras/issues/217
Then the speech recognition is streaming the transcribed output is always "you". It is using whisper for the transcribing. When I specifically use whisper and click on the microphone it works perfectly. But when streaming it only shows the word "you" on the terminal even if I don't say anything. I can confirm the microphone is activated when recording the audio. I have used SillyTavern on Windows 11, Debian, and Modded Debian with the same result. Any recommendations on what I can do to resolve this? I am on the latest ffmpeg, running the latest Extras in conda, and have enough horsepower to run the Extras program as intended.