collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.
MIT License
1.7k stars 234 forks source link

HLS Stream Transcription with timestamps #138

Open drajvver opened 6 months ago

drajvver commented 6 months ago

Hello!

First of all, amazing work on the library, I really appreciate what you're doing here!

I'm trying to understand how (and if it's even possible) to generate SRT with timestamps from HLS stream.

I have created a simple script:

from whisper_live.client import TranscriptionClient, Client

def test():
    client = Client(
    "localhost",
    9090,
    lang="en",
    translate=False,
    model="tiny.en",
    srt_file_path="test.srt",
)

    while not client.recording:
        if client.waiting or client.server_error:
            client.close_websocket()
            return

    client.process_hls_stream(
        "http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8"
    )

test()

but it doesn't work as I would expect. It sends stream to server but doesn't receive response, or hangs. When using the original TranscriptionClient it works fine, but the text is just streaming without any timestamps etc.

Can this be arranged to work?

Thanks!

makaveli10 commented 6 months ago

Hello @drajvver , thanks for using WhisperLive. The srt file is written at the end of the session. So, the transcription that is streamed is actually without timestamps i.e. whatever audio has been already processed by the backend. If you want to use the timestamps associated with the subtitle, you'd have to use the start and end time associated with each segment returned as response by the backend append them to a srt file. This is something that would be a nice addition.

fallenangel3k commented 6 months ago

how to "end" a session properly? ctrl-c seems NOT a way, what I am doing now...

maybe also add some verboseiness to the console enabled by maybe like "--debug", to see what it is doing, like in the last version, right now it does not show anything in console but works fine. i think this was to reduce the clutter in the console, especially when working with multiple users at once. but still having it/back would be nice. (may also with extended-verbose, showing the single transcriptions with START & END as cleartext in the console, too, no matter how the final transcribe will be in the end. ( i am working single-user and would love that feature, to confirm and expand upon it)