Closed Kreevoz closed 7 months ago
I have seen similar instances. I do believe this is the model hallucinating. There may be some audio artifacts in the samples that trigger the hallucination but I am uncertain, this could happen if there is ambient sound, if there is no mic attached and the audio sample is actually silent I would expect the model to transcribe a blank message.
Indeed, I'd expect so too. I hooked the input to one of the digital inputs for testing purposes and made sure there was 0 noise in it entirely, just empty samples. Still lead to that hallucination so I'm a bit puzzled. 🤔
I'm having the same problem as well.
Using the binary downloaded from releases, it doesn't work at all. I'm using a microphone to test and it didn't work. Also tried Stereo Mix but because I'm using a bluetooth speaker, it won't work.
I then tried setting up the Python build and got a bit further. This is what I got lol. Again, using microphone.
i changed the level for audio detection and it stopped this
With the audio input perfectly muted = no samples coming in, and the 'large' model being chosen, instead of doing nothing, every time the transcription is executed, "you" is written onto a new line followed by a linebreak.
The frequency with which this happens is directly tied to the seconds_of_silence_between_lines / transcribe_rate in the settings file.
Is there some sort of audio artifact added to the buffer as a result of chopping the input into chunks? I'm not sure why the large model specifically hallucinates this particular output, but it still happens.