fedirz / faster-whisper-server

https://hub.docker.com/r/fedirz/faster-whisper-server
MIT License
354 stars 51 forks source link

# WebSocket Transcription Endpoint Returns Duplicate Transcriptions #37

Open Gan-Xing opened 1 month ago

Gan-Xing commented 1 month ago

Description

When using the WebSocket transcription endpoint /v1/audio/transcriptions, the server responds with duplicate transcriptions for a single audio input. This occurs consistently for every audio file sent to the server.

Steps to Reproduce

  1. Set up the WebSocket connection with the following URL:
    ws://<your-server-ip>:8000/v1/audio/transcriptions?model=<model>&language=<language>&response_format=json&temperature=0
  2. Send an audio file in PCM format through the WebSocket connection.
  3. Observe the server responses. The server responds twice with the same transcription.

Expected Behavior

The server should only respond once with the transcription for each audio input.

Actual Behavior

The server responds twice with the same transcription for a single audio input.

Logs

Below are the logs showing the duplicate responses:

INFO:     connection closed
INFO:     ('100.82.89.123', 62682) - "WebSocket /v1/audio/transcriptions?model=Systran/faster-whisper-large-v3&language=en&response_format=json&temperature=0" [accepted]
INFO:     connection open
2024-07-15 20:09:37,600:INFO:faster_whisper_server.logger:_transcribe:Transcribed Audio(start=0.00, end=18.54) in 0.86 seconds. Prompt: None. Transcription: There  is  a  problem.  It  respond  twice  when  I  ask  one  time?  I  don't  know  why  it happen.  Could you  fix  that?
2024-07-15 20:09:37,740:INFO:faster_whisper_server.logger:audio_receiver:No data received in 1.0 seconds. Closing the connection.
2024-07-15 20:09:37,740:INFO:faster_whisper_server.logger:close:AudioStream closed
2024-07-15 20:09:38,551:INFO:faster_whisper_server.logger:_transcribe:Transcribed Audio(start=0.00, end=18.54) in 0.81 seconds. Prompt: None. Transcription: There  is  a  problem.  It  respond  twice  when  I  ask  one  time?  I  don't  know  why  it happen.  Could you  fix  that?
2024-07-15 20:09:38,554:INFO:faster_whisper_server.logger:audio_transcriber:Audio transcriber finished
2024-07-15 20:09:38,554:INFO:faster_whisper_server.logger:transcribe_stream:Closing the connection.
INFO:     connection closed

Environment

Additional Information

The issue seems to be related to the task group handling in the transcribe_stream function, where the audio processing tasks might be triggering multiple responses. Here is the relevant section of the code:

@app.websocket("/v1/audio/transcriptions")
async def transcribe_stream(
    ws: WebSocket,
    model: Annotated[ModelName, Query()] = config.whisper.model,
    language: Annotated[Language | None, Query()] = config.default_language,
    response_format: Annotated[ResponseFormat, Query()] = config.default_response_format,
    temperature: Annotated[float, Query()] = 0.0,
) -> None:
    await ws.accept()
    transcribe_opts = {
        "language": language,
        "temperature": temperature,
        "vad_filter": True,
        "condition_on_previous_text": False,
    }
    whisper = load_model(model)
    asr = FasterWhisperASR(whisper, **transcribe_opts)
    audio_stream = AudioStream()
    task_group = asyncio.TaskGroup()

    async with task_group:
        await task_group.create_task(audio_receiver(ws, audio_stream))
        async for transcription in audio_transcriber(asr, audio_stream):
            logger.debug(f"Sending transcription: {transcription.text}")
            if ws.client_state == WebSocketState.DISCONNECTED:
                break

            if response_format == ResponseFormat.TEXT:
                await ws.send_text(transcription.text)
            elif response_format == ResponseFormat.JSON:
                await ws.send_json(TranscriptionJsonResponse(text=transcription.text).model_dump())
            elif response_format == ResponseFormat.VERBOSE_JSON:
                await ws.send_json(TranscriptionVerboseJsonResponse(text=transcription.text).model_dump())

        # Clean up tasks to ensure no further responses are sent
        task_group.cancel()

    if ws.client_state != WebSocketState.DISCONNECTED:
        logger.info("Closing the connection.")
        await ws.close()

Thank you for your assistance in resolving this issue. Please let me know if you need any additional information.

fedirz commented 1 month ago

Thanks, for creating an issue. I'll look into this later this week.