Closed lukeewin closed 1 year ago
This does not seem to be an issue with faster-whisper. The error tells you the audio stream is invalid.
If you want more help you should provide a way to reproduce the error.
I am interested with this too. Please share more about this and maybe I help.
this is my code:
import asyncio import json import wave import pyaudio import websockets from faster_whisper import WhisperModel import os import numpy as np import sounddevice as sd
stream = sd.OutputStream(samplerate=16000, channels=1) stream.start()
os.environ["KMP_DUPLICATE_LIB_OK"] = "TRUE"
def initWhisperModel(): model_size = "tiny" model = WhisperModel(model_size, device="cpu", compute_type="int8", num_workers=8, local_files_only=True) return model
def write_wav(out_path, audio_data): wf = wave.open(out_path, 'wb') wf.setnchannels(1) p = pyaudio.PyAudio() wf.setsampwidth(p.get_sample_size(pyaudio.paInt16)) wf.setframerate(16000) wf.writeframes(audio_data) wf.close()
async def process_audio_data(audio_frame, websocket): segments, info = model.transcribe(audio_frame, beam_size=5, language="zh", initial_prompt="转录成简体中文", vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500)) for segment in segments: message = json.dumps({"text": segment.text}) await websocket.send(message)
async def receive_audio_data(websocket, path): try: async for message in websocket: if isinstance(message, bytes):
audio_data = np.frombuffer(message, dtype=np.float32)
await process_audio_data(audio_data, websocket)
except websockets.ConnectionClosed:
print("WebSocket connection closed.")
model = initWhisperModel() start_server = websockets.serve(receive_audio_data, 'localhost', 8765, subprotocols=["binary"], ping_interval=None) asyncio.get_event_loop().run_until_complete(start_server) asyncio.get_event_loop().run_forever()
The code execution did not produce any transcription results.
Did you resolve the initial error? If yes, can we close this issue?
I am also facing the same issue :
Invalid data found when processing input: '«none>'
Basically we were looking to use real-time audio from the client side and stream it as data buffers to the backend. But fasterWhisper transcribes the initial audio stream only and when the second stream comes as a data buffer, it throws an error :
Invalid data found when processing input: '«none>'
This is the client side inspection logs, where we can see that it works for the first audio stream i.e "Let's Go" and fails for the second audio stream. However, it works again for the first stream if client connection is refreshed.
And this is the error log in the server side:
And this is how we use faster-whisper in code:
Kindly help guys! This makes us stuck and we can't move no forward!
Please provide a minimal client/server code to reproduce the error.
Kindly find the code in dev branch of the repo below and steps are there in README to reproduce the error
Hope you reproduced the error from the code shared @guillaumekln
Kindly let me know any insight that can be helpful for us so that we can take our project out of this loophole.
@JayAdityaNautiyal
Tried reproduced code you provided.
Maybe the reason is bytes info of second round loss some required data (maybe metadata of the voice, I think they are in the first file chunk) ,so it can not be parsed by py-av
module successfully.
The following may be a temporary work around: in your backend/app.py
# backend/app.py
async def handle_websocket(websocket, path):
print(f"Client connected to {path}")
# save whole bytes from client
binary_total = b""
try:
async for binary_data in websocket:
# concat new binary_data with old binary data
binary_total += binary_data
# and pass them all to io.BytesIO
dataBuffer = io.BytesIO(binary_total)
# pass to transcribe
segments, info = model.transcribe(dataBuffer, beam_size=5)
# omit......
@shunnNet This worked for us. Thanks a lot.
thanks a lot @shunnNet
This worked.
I had this same issue when implementing using django and websockets, the solution I came up with is ensuring that each chunk is sent as separate audio data from the client side (initially I treated the chunks as a single audio data splitted into bytes which leads to corrupted data due to loss of some metadata and the audio bytes extension). This solution ensures that the chunks being sent fro the client side are not corrupted and sent as standalone audio data to the server side. I don't know who this might help though.
@Mikehade So happy to see you already solved this issue. Could you please share once the solution with me? I facing the same issue, it make so feel so consuse.
@bakazhou, you just have to start a new recording to generate new file after every chunk so it does not get corrupted as seen in the snippet html below since you lose audio metadata if you do not.
const socket = new WebSocket(`ws://localhost:port/path}`);
socket.binaryType = 'arraybuffer'; // Set binary type for the socket connection
socket.onopen = () => {
document.querySelector('#status').textContent = 'Connected';
console.log({ event: 'onopen' });
// Function to record and send chunks
function recordAndSend(stream) {
const mediaRecorder = new MediaRecorder(stream, { mimeType: 'audio/webm' });
mediaRecorder.ondataavailable = e => {
if (e.data.size > 0) {
// Send the chunk to the server
socket.send(e.data);
}
};
mediaRecorder.onstop = e => {
// Here you can handle the end of recording if you want to handle it
};
setTimeout(() => mediaRecorder.stop(), CHUNK_DURATION_MS); // Stop recording after specified chunk duration
mediaRecorder.start();
}
// Generate a new file every specified chunk duration
setInterval(() => recordAndSend(stream), CHUNK_DURATION_MS);
}
You also have to set your chunk duration
OS: windows 11 Python Version:3.10.11 websockets Version: 11.0.3 faster-whisper Version: 0.7.0
When I retrieve user audio in real-time from the microphone through the client WebSocket and process it using the io.BytesIO() function, I encounter an error when passing it to the transcription function of faster-whisper.
error: File "D:\Software\Anaconda\envs\RealTime\lib\site-packages\faster_whisper\audio.py", line 45, in decode_audio with av.open(input_file, metadata_errors="ignore") as container: File "av\container\core.pyx", line 401, in av.container.core.open File "av\container\core.pyx", line 272, in av.container.core.Container.cinit File "av\container\core.pyx", line 292, in av.container.core.Container.err_check File "av\error.pyx", line 336, in av.error.err_check av.error.InvalidDataError: [Errno 1094995529] Invalid data found when processing input: ''
this is my code:
async def process_audio_data(audio_frame, websocket): audio_bytes = b"".join(audio_frame) wav_stream = BytesIO(audio_bytes) segments, info = model.transcribe(wav_stream, beam_size=5, vad_filter=True, vad_parameters=dict(min_silence_duration_ms=500)) for segment in segments: message = json.dumps({"text": segment.text}) await websocket.send(message)