alphacep / vosk-api

Offline speech recognition API for Android, iOS, Raspberry Pi and servers with Python, Java, C# and Node
Apache License 2.0
7.38k stars 1.04k forks source link

not getting transcribed text #1436

Open ankur995 opened 10 months ago

ankur995 commented 10 months ago

i am sending audio blob using mediarecorder to django websocket using this navigator.mediaDevices.getUserMedia({ audio: true }).then((stream) => { const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' }); const socket = new WebSocket('ws://localhost:8000/ws/transcribe/');

mediaRecorder.addEventListener('dataavailable', async (event) => { if (event.data.size > 0) { const audioBlob = await event.data.arrayBuffer(); socket.send(audioBlob); console.log(audioBlob); } }); but i am not able to convert audioBlob to proper vosk supported format tried everything but not able to convert it. Earlier i tried by saving audio and then converting it that works fine but now i am trying to do real time transcription that is not working

ankur995 commented 10 months ago

not able to convert using this async def convert_to_wav(self, audio_blob): sample_rate = 16000 try: audio = AudioSegment.from_file(io.BytesIO(audio_blob), format="webm") audio = audio.set_frame_rate(sample_rate).set_channels(1).set_sample_width(2)

    wav_buffer = io.BytesIO()
    with wave.open(wav_buffer, 'wb') as wav_file:
        wav_file.setnchannels(1)
        wav_file.setsampwidth(2)
        wav_file.setframerate(sample_rate)
        wav_file.writeframes(audio.raw_data)

    return wav_buffer.getvalue()
except Exception as e:
    print("Error converting to WAV:", str(e))
    return None

Tried another approach also but not getting desired result async def convert_to_wav(self, audio_blob): sample_rate = 16000 try:

        wav_buffer = io.BytesIO()

        cmd = [
            'ffmpeg', '-loglevel', 'quiet', '-f', 's16le', '-ar', str(sample_rate),
            '-ac', '1', '-i', '-', '-f', 'wav', '-'
        ]

        process = await asyncio.create_subprocess_exec(
            *cmd,
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=asyncio.subprocess.PIPE
        )

        process.stdin.write(audio_blob)
        process.stdin.close()

        wav_data = await process.stdout.read()
        await process.wait()

        wav_buffer.write(wav_data)

        return wav_buffer.getvalue()
    except Exception as e:
        print("Error converting to WAV:", str(e))
        return None
nshmyrev commented 10 months ago

It is not trivial to convert webm stream. Use wav format for mediarecorder instead. In general, we have client samples in vosk-server project, you can check web example here. If you want fast response, use webrtc instead of websocket.

ankur995 commented 10 months ago

can you please provide me the link for client samples and for webrtc

nshmyrev commented 10 months ago

https://github.com/alphacep/vosk-server/tree/master/client-samples

https://github.com/alphacep/vosk-server/tree/master/webrtc

ankur995 commented 10 months ago

Thank You! so much let me explore this. One more question is there any vosk model for en-hi which can transcribe both hindi and english like model can write hindi words in english and english in english

nshmyrev commented 10 months ago

We have initial model like this, it is not yet released.

ankur995 commented 10 months ago

for sending audio through websocket i am using this now function sendAudio(audioDataChunk) { if (webSocket.readyState === WebSocket.OPEN) { const inputData = audioDataChunk.inputBuffer.getChannelData(0) || new Float32Array(bufferSize); const targetBuffer = new Int16Array(inputData.length); for (let index = inputData.length - 1; index >= 0; index--) { targetBuffer[index] = 32767 * Math.min(1, inputData[index]); } webSocket.send(targetBuffer.buffer); console.log(targetBuffer.buffer) } }
and for transcribing i am doing this:- async def transcribe_audio(self, audio_data): try: print("Transcribing audio...") if not self.recognizer: print("Recognizer not initialized.") return ""

        self.recognizer.AcceptWaveform(audio_data)
        print("Waveform accepted.")

        result = json.loads(self.recognizer.Result())
        print("Recognition result:", result)
        transcribed_text = result.get("text", "").strip()

        partial_result = self.recognizer.PartialResult()
        if partial_result:
            partial_text = json.loads(partial_result).get("text", "").strip()
            print("Partial Transcription:", partial_text)

        print("Transcription:", transcribed_text)
        return transcribed_text
    except Exception as e:
        print("Recognition error:", str(e))
        return ""

    so here i am getting transcribed text but it is missing few words here i am using sample rate of 8000hz