Closed henriklied closed 2 months ago
Hi Henrik, glad to see you've found some use in it!
That's quite odd indeed. I can't seem to replicate this against the HLS link provided under my own development environment (Win + CUDA on VLC 3.0.20). I did notice quite a few hallucinations from the model where it tried to fill in the silences, but it did seem to output subtitles for where there was speech.
There are a few things that may still be worth trying, though:
--use-cuda=false
to the script. This is CPU-only, so it'll be slower, but it should at least help narrow things down. I am not entirely sure if older cards work as they should under FP16 precision, as I've only tested this on an RTX4090 and/or on CPU myself.--hard-subs
flag. That'll embed the subtitles into the stream directly, rather than generate VTT files separately. If it's an issue with the player's handling of WebVTT, that should reveal it.Hi Psychotropos, and thanks for getting back to me so quickly! The issues went away when I changed to using my own m3u8 endpoint, and now things are working well.
A follow up question: Have you considered feeding the Whisper process with concatenated audio segments in order to increase the context for the transcription? I assume that would give a better result in terms of quality of transcription. It would require some work to split the transcribed text back into the right chunks, but maybe it's worth a try?
Hi Henrik,
It's a good suggestion, and something I considered myself. I'll do some further investigation on it when I find the time to. I'm closing this issue for now given the original issue's been dealt with, but do feel free to raise additional issues and/or PRs if you think of anything else. Thanks again.
Very cool project! I'm trying to get it up and running, and everything seems to be starting as it should, but there are very few requests towards the .vtt-endpoints. And I'm unable to see the subtitles.
I see a couple of requests to the .vtt-chunks when selecting the subtitle track in the player (VLC, Quicktime Player), but none of these vtt-chunks are displayed in the player.
Any thoughts on why that might be?
Reproduce by:
python main.py -u "https://cph-msl.akamaized.net/hls/live/2000341/test/master.m3u8"
Here are my server logs when starting a client connection in Quicktime and enabling subs: