collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
1.85k stars 248 forks source link

HLS Transcription Stops After A Couple Minutes #205

Open austinm1120 opened 5 months ago

austinm1120 commented 5 months ago

I set up a local HLS stream playing a long video of someone talking.

Everything seems great until after exactly 2 minutes in the transcription stops completely.

INFO:faster_whisper:Processing audio with duration 00:07.936 INFO:faster_whisper:Processing audio with duration 00:02.984 INFO:faster_whisper:Processing audio with duration 00:03.032 INFO:faster_whisper:Processing audio with duration 00:01.432 INFO:faster_whisper:Processing audio with duration 00:03.480 INFO:faster_whisper:Processing audio with duration 00:05.272 INFO:faster_whisper:Processing audio with duration 00:01.152 INFO:faster_whisper:Processing audio with duration 00:03.200 INFO:faster_whisper:Processing audio with duration 00:02.548 INFO:faster_whisper:Processing audio with duration 00:04.596 INFO:faster_whisper:Processing audio with duration 00:01.796 INFO:faster_whisper:Processing audio with duration 00:03.844 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484

In the server logs i can see that chunks of variable length are processed by the server. However the problem starts when the "00:02.484" chunks keep getting processed. I'm unsure if its just continuing to send the same chunk and it keeps translating it therefore the client appears to be "stuck" or if its stuck in a different loop of some sort.

Setting use_vad to True doesn't seem to make a difference.

I have tried both on Mac (M3 Max chip) and Windows 10. Both docker and python server. Both produce the same results.

This is the client code:

from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
  "localhost",
  9090,
  lang="en",
  translate=False,
  model="small",
  use_vad=False,
)

client(hls_url="http://localhost.m3u8")
makaveli10 commented 4 months ago

@austinm1120 Thanks for reporting the issue, does whisper-live behave similarly with other input types as well or is it only HLS?

nums commented 4 months ago

Same issue with rtmp / rtsp

aliuspetraska commented 3 months ago

same happens to me, but this time it's exactly 10 minutes every time:

[INFO]: Server disconnected due to overtime.
[INFO]: Websocket connection closed: 1000: 

Investigating on my own and if I will come to any fixes, will let you know.

aliuspetraska commented 3 months ago

OK, I found the "issue" :) It's not an issue, it's was designed to work that way: https://github.com/collabora/WhisperLive/blob/main/whisper_live/server.py#L28

So in my case I'll do refactoring to strip that part.