Open austinm1120 opened 5 months ago
@austinm1120 Thanks for reporting the issue, does whisper-live behave similarly with other input types as well or is it only HLS?
Same issue with rtmp / rtsp
same happens to me, but this time it's exactly 10 minutes every time:
[INFO]: Server disconnected due to overtime.
[INFO]: Websocket connection closed: 1000:
Investigating on my own and if I will come to any fixes, will let you know.
OK, I found the "issue" :) It's not an issue, it's was designed to work that way: https://github.com/collabora/WhisperLive/blob/main/whisper_live/server.py#L28
So in my case I'll do refactoring to strip that part.
I set up a local HLS stream playing a long video of someone talking.
Everything seems great until after exactly 2 minutes in the transcription stops completely.
INFO:faster_whisper:Processing audio with duration 00:07.936 INFO:faster_whisper:Processing audio with duration 00:02.984 INFO:faster_whisper:Processing audio with duration 00:03.032 INFO:faster_whisper:Processing audio with duration 00:01.432 INFO:faster_whisper:Processing audio with duration 00:03.480 INFO:faster_whisper:Processing audio with duration 00:05.272 INFO:faster_whisper:Processing audio with duration 00:01.152 INFO:faster_whisper:Processing audio with duration 00:03.200 INFO:faster_whisper:Processing audio with duration 00:02.548 INFO:faster_whisper:Processing audio with duration 00:04.596 INFO:faster_whisper:Processing audio with duration 00:01.796 INFO:faster_whisper:Processing audio with duration 00:03.844 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484 INFO:faster_whisper:Processing audio with duration 00:02.484
In the server logs i can see that chunks of variable length are processed by the server. However the problem starts when the "00:02.484" chunks keep getting processed. I'm unsure if its just continuing to send the same chunk and it keeps translating it therefore the client appears to be "stuck" or if its stuck in a different loop of some sort.
Setting use_vad to True doesn't seem to make a difference.
I have tried both on Mac (M3 Max chip) and Windows 10. Both docker and python server. Both produce the same results.
This is the client code: