lukeewin commented 1 year ago

OS: windows 11 Python Version：3.10.11 websockets Version: 11.0.3 faster-whisper Version: 0.7.0

When I retrieve user audio in real-time from the microphone through the client WebSocket and process it using the io.BytesIO() function, I encounter an error when passing it to the transcription function of faster-whisper.

error: File "D:\Software\Anaconda\envs\RealTime\lib\site-packages\faster_whisper\audio.py", line 45, in decode_audio with av.open(input_file, metadata_errors="ignore") as container: File "av\container\core.pyx", line 401, in av.container.core.open File "av\container\core.pyx", line 272, in av.container.core.Container.cinit File "av\container\core.pyx", line 292, in av.container.core.Container.err_check File "av\error.pyx", line 336, in av.error.err_check av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input: ''

this is my code:

async def process_audio_data(audio_frame, websocket): audio_bytes = b"".join(audio_frame) wav_stream = BytesIO(audio_bytes) segments, info = model.transcribe(wav_stream, beam_size=5, vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500)) for segment in segments: message = json.dumps({"text": segment.text}) await websocket.send(message)

guillaumekln commented 1 year ago

This does not seem to be an issue with faster-whisper. The error tells you the audio stream is invalid.

If you want more help you should provide a way to reproduce the error.

nobody4t commented 1 year ago

I am interested with this too. Please share more about this and maybe I help.

lukeewin commented 1 year ago

this is my code:


import asyncio
import json
import wave
import pyaudio
import websockets
from faster_whisper import WhisperModel
import os
import numpy as np
import sounddevice as sd

stream = sd.OutputStream(samplerate=16000, channels=1) stream.start()

os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"

def initWhisperModel(): model_size = "tiny" model = WhisperModel(model_size, device="cpu", compute_type="int8", num_workers=8, local_files_only=True) return model

def write_wav(out_path, audio_data): wf = wave.open(out_path, 'wb') wf.setnchannels(1) p = pyaudio.PyAudio() wf.setsampwidth(p.get_sample_size(pyaudio.paInt16)) wf.setframerate(16000) wf.writeframes(audio_data) wf.close()

async def process_audio_data(audio_frame, websocket): segments, info = model.transcribe(audio_frame, beam_size=5, language="zh", initial_prompt="转录成简体中文", vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500)) for segment in segments: message = json.dumps({"text": segment.text}) await websocket.send(message)

async def receive_audio_data(websocket, path): try: async for message in websocket: if isinstance(message, bytes):

将字符串转换为 numpy 数组

            audio_data = np.frombuffer(message, dtype=np.float32)
            await process_audio_data(audio_data, websocket)

except websockets.ConnectionClosed:
    print("WebSocket connection closed.")

model = initWhisperModel() start_server = websockets.serve(receive_audio_data, 'localhost', 8765, subprotocols=["binary"], ping_interval=None) asyncio.get_event_loop().run_until_complete(start_server) asyncio.get_event_loop().run_forever()


The code execution did not produce any transcription results.

guillaumekln commented 1 year ago

Did you resolve the initial error? If yes, can we close this issue?

JayAdityaNautiyal commented 1 year ago

I am also facing the same issue :

Invalid data found when processing input: '«none>'

Basically we were looking to use real-time audio from the client side and stream it as data buffers to the backend. But fasterWhisper transcribes the initial audio stream only and when the second stream comes as a data buffer, it throws an error :

Invalid data found when processing input: '«none>'

This is the client side inspection logs, where we can see that it works for the first audio stream i.e "Let's Go" and fails for the second audio stream. However, it works again for the first stream if client connection is refreshed.

And this is the error log in the server side:

And this is how we use faster-whisper in code:

Kindly help guys! This makes us stuck and we can't move no forward!

guillaumekln commented 1 year ago

Please provide a minimal client/server code to reproduce the error.

JayAdityaNautiyal commented 1 year ago

Kindly find the code in dev branch of the repo below and steps are there in README to reproduce the error

https://github.com/sumitesh9/RealTimeTranscription.git

JayAdityaNautiyal commented 1 year ago

Hope you reproduced the error from the code shared @guillaumekln

Kindly let me know any insight that can be helpful for us so that we can take our project out of this loophole.

shunnNet commented 1 year ago

@JayAdityaNautiyal Tried reproduced code you provided. Maybe the reason is bytes info of second round loss some required data (maybe metadata of the voice, I think they are in the first file chunk) ,so it can not be parsed by py-av module successfully.

The following may be a temporary work around: in your backend/app.py

# backend/app.py
async def handle_websocket(websocket, path):
    print(f"Client connected to {path}")

    # save whole bytes from client
    binary_total = b""

    try:
        async for binary_data in websocket:
            # concat new binary_data with old binary data
            binary_total += binary_data

            # and pass them all to io.BytesIO
            dataBuffer = io.BytesIO(binary_total)

            # pass to transcribe
            segments, info = model.transcribe(dataBuffer, beam_size=5)

            # omit......

sumitesh9 commented 1 year ago

@shunnNet This worked for us. Thanks a lot.

JayAdityaNautiyal commented 1 year ago

thanks a lot @shunnNet

This worked.

Mikehade commented 7 months ago

I had this same issue when implementing using django and websockets, the solution I came up with is ensuring that each chunk is sent as separate audio data from the client side (initially I treated the chunks as a single audio data splitted into bytes which leads to corrupted data due to loss of some metadata and the audio bytes extension). This solution ensures that the chunks being sent fro the client side are not corrupted and sent as standalone audio data to the server side. I don't know who this might help though.

bakazhou commented 2 weeks ago

@Mikehade So happy to see you already solved this issue. Could you please share once the solution with me? I facing the same issue, it make so feel so consuse.

Mikehade commented 2 weeks ago

@bakazhou, you just have to start a new recording to generate new file after every chunk so it does not get corrupted as seen in the snippet html below since you lose audio metadata if you do not.

const socket = new WebSocket(`ws://localhost:port/path}`); 
            socket.binaryType = 'arraybuffer'; // Set binary type for the socket connection

            socket.onopen = () => {
                document.querySelector('#status').textContent = 'Connected';
                console.log({ event: 'onopen' });

                // Function to record and send chunks
                function recordAndSend(stream) {
                    const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });

                    mediaRecorder.ondataavailable = e => {
                        if (e.data.size > 0) {

                            // Send the chunk to the server
                            socket.send(e.data);
                        }
                    };

                    mediaRecorder.onstop = e => {
                        // Here you can handle the end of recording if you want to handle it
                    };

                    setTimeout(() => mediaRecorder.stop(), CHUNK_DURATION_MS); // Stop recording after specified chunk duration
                    mediaRecorder.start();
                }

                // Generate a new file every specified chunk duration
                setInterval(() => recordAndSend(stream), CHUNK_DURATION_MS);
            }

You also have to set your chunk duration

SYSTRAN / faster-whisper

When I retrieve audio data from the WebSocket client and process it with io.BytesIO(audio_bytes), it still throws an error. #361

将字符串转换为 numpy 数组