davabase / transcriber_app

Real time speech to text transcription app.
363 stars 69 forks source link

Transcript output writes "you" if using Large model #8

Closed Kreevoz closed 7 months ago

Kreevoz commented 1 year ago

With the audio input perfectly muted = no samples coming in, and the 'large' model being chosen, instead of doing nothing, every time the transcription is executed, "you" is written onto a new line followed by a linebreak.

The frequency with which this happens is directly tied to the seconds_of_silence_between_lines / transcribe_rate in the settings file.

Is there some sort of audio artifact added to the buffer as a result of chopping the input into chunks? I'm not sure why the large model specifically hallucinates this particular output, but it still happens.

davabase commented 1 year ago

I have seen similar instances. I do believe this is the model hallucinating. There may be some audio artifacts in the samples that trigger the hallucination but I am uncertain, this could happen if there is ambient sound, if there is no mic attached and the audio sample is actually silent I would expect the model to transcribe a blank message.

Kreevoz commented 1 year ago

Indeed, I'd expect so too. I hooked the input to one of the digital inputs for testing purposes and made sure there was 0 noise in it entirely, just empty samples. Still lead to that hallucination so I'm a bit puzzled. 🤔

chaoscreater commented 12 months ago

I'm having the same problem as well.

Using the binary downloaded from releases, it doesn't work at all. I'm using a microphone to test and it didn't work. Also tried Stereo Mix but because I'm using a bluetooth speaker, it won't work.

I then tried setting up the Python build and got a bit further. This is what I got lol. Again, using microphone.

image

wifiuk commented 8 months ago

i changed the level for audio detection and it stopped this