alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
8.1k stars 1.11k forks source link

I am trying to recognize speech from discord channel audio, vosk is putting out empty strings #1634

Open saipavankumar-muppalaneni opened 1 month ago

saipavankumar-muppalaneni commented 1 month ago

I have tried all the possible settings for Models, sample rate, and channels, I am not able to get recognized speech from VOSK, just the empty strings, I have tried the same sample on free speech recognizing websites and they all worked fine with my sample.

def transcribe_audio(audio_file): global model, recognizer if not model: print("Error: Vosk model not initialized.") return

wf = wave.open(audio_file, "rb")
if wf.getnchannels() != 1 or wf.getsampwidth() != 2 or wf.getcomptype() != "NONE":
    print("Audio file must be WAV format mono PCM.")
    return

recognizer = KaldiRecognizer(model, wf.getframerate())
while True:
    data = wf.readframes(4000)
    if len(data) == 0:
        break
    if recognizer.AcceptWaveform(data):
        result = recognizer.Result()
        print(result)
        # transcription = result[14:-3]  # Extract the transcribed text
        # print(transcription)

if recognizer.FinalResult():
    result = recognizer.FinalResult()
    print(result)
    # transcription = result[14:-3]  # Extract the transcribed text
    # print(transcription)

def init_vosk(): global model if not model: try: model = Model(model_name="vosk-model-small-en-us-0.15") print("Vosk model loaded successfully.") except Exception as e: print(f"Error loading Vosk model: {e}")

nshmyrev commented 1 month ago

Make sure input audio data has correct format