When using the WebSocket transcription endpoint /v1/audio/transcriptions, the server responds with duplicate transcriptions for a single audio input. This occurs consistently for every audio file sent to the server.
Steps to Reproduce
Set up the WebSocket connection with the following URL:
Send an audio file in PCM format through the WebSocket connection.
Observe the server responses. The server responds twice with the same transcription.
Expected Behavior
The server should only respond once with the transcription for each audio input.
Actual Behavior
The server responds twice with the same transcription for a single audio input.
Logs
Below are the logs showing the duplicate responses:
INFO: connection closed
INFO: ('100.82.89.123', 62682) - "WebSocket /v1/audio/transcriptions?model=Systran/faster-whisper-large-v3&language=en&response_format=json&temperature=0" [accepted]
INFO: connection open
2024-07-15 20:09:37,600:INFO:faster_whisper_server.logger:_transcribe:Transcribed Audio(start=0.00, end=18.54) in 0.86 seconds. Prompt: None. Transcription: There is a problem. It respond twice when I ask one time? I don't know why it happen. Could you fix that?
2024-07-15 20:09:37,740:INFO:faster_whisper_server.logger:audio_receiver:No data received in 1.0 seconds. Closing the connection.
2024-07-15 20:09:37,740:INFO:faster_whisper_server.logger:close:AudioStream closed
2024-07-15 20:09:38,551:INFO:faster_whisper_server.logger:_transcribe:Transcribed Audio(start=0.00, end=18.54) in 0.81 seconds. Prompt: None. Transcription: There is a problem. It respond twice when I ask one time? I don't know why it happen. Could you fix that?
2024-07-15 20:09:38,554:INFO:faster_whisper_server.logger:audio_transcriber:Audio transcriber finished
2024-07-15 20:09:38,554:INFO:faster_whisper_server.logger:transcribe_stream:Closing the connection.
INFO: connection closed
Environment
**Operating System:22.04
Additional Information
The issue seems to be related to the task group handling in the transcribe_stream function, where the audio processing tasks might be triggering multiple responses. Here is the relevant section of the code:
@app.websocket("/v1/audio/transcriptions")
async def transcribe_stream(
ws: WebSocket,
model: Annotated[ModelName, Query()] = config.whisper.model,
language: Annotated[Language | None, Query()] = config.default_language,
response_format: Annotated[ResponseFormat, Query()] = config.default_response_format,
temperature: Annotated[float, Query()] = 0.0,
) -> None:
await ws.accept()
transcribe_opts = {
"language": language,
"temperature": temperature,
"vad_filter": True,
"condition_on_previous_text": False,
}
whisper = load_model(model)
asr = FasterWhisperASR(whisper, **transcribe_opts)
audio_stream = AudioStream()
task_group = asyncio.TaskGroup()
async with task_group:
await task_group.create_task(audio_receiver(ws, audio_stream))
async for transcription in audio_transcriber(asr, audio_stream):
logger.debug(f"Sending transcription: {transcription.text}")
if ws.client_state == WebSocketState.DISCONNECTED:
break
if response_format == ResponseFormat.TEXT:
await ws.send_text(transcription.text)
elif response_format == ResponseFormat.JSON:
await ws.send_json(TranscriptionJsonResponse(text=transcription.text).model_dump())
elif response_format == ResponseFormat.VERBOSE_JSON:
await ws.send_json(TranscriptionVerboseJsonResponse(text=transcription.text).model_dump())
# Clean up tasks to ensure no further responses are sent
task_group.cancel()
if ws.client_state != WebSocketState.DISCONNECTED:
logger.info("Closing the connection.")
await ws.close()
Thank you for your assistance in resolving this issue. Please let me know if you need any additional information.
Description
When using the WebSocket transcription endpoint
/v1/audio/transcriptions
, the server responds with duplicate transcriptions for a single audio input. This occurs consistently for every audio file sent to the server.Steps to Reproduce
Expected Behavior
The server should only respond once with the transcription for each audio input.
Actual Behavior
The server responds twice with the same transcription for a single audio input.
Logs
Below are the logs showing the duplicate responses:
Environment
Additional Information
The issue seems to be related to the task group handling in the
transcribe_stream
function, where the audio processing tasks might be triggering multiple responses. Here is the relevant section of the code:Thank you for your assistance in resolving this issue. Please let me know if you need any additional information.