GhostNaN / whisper-subs

WhisperSubs is a mpv lua script to generate subtitles at runtime with whisper.cpp on Linux
MIT License
43 stars 2 forks source link

Proof of concept: use MPVs local caches instead of yt-dlp for livestreams #2

Open TakeoIschiFan opened 5 months ago

TakeoIschiFan commented 5 months ago

At this moment the default behaviour of this script is to subtitle a stream from the beginning using a seperate yt-dl download, which means this script can't be used for live translation purposes or on non rewindable / in-progress livestreams as far as I know.

Now MPV has a built-in caching feature which can be enabled with the cache=yes flag. As long as the read-ahead cache is always at least as big as whisper chunk size it should be theoretically possible to save a part of the cache, process an audio chunk and display it to the user at the same time the user sees the video, essentially doing semi-live translation.

Below is a proof-of-concept code snippet that uses the dump_cache command to flush the cache to disk, after which we can process it just like we would any other audio snippet.


local function run_using_cache()
    local chunk_size_sec = CHUNK_SIZE / 1000
    -- How far in the future we will grab the audio. For it to work correctly, the cache should always be chunk_size_sec + buffer seconds or larger
    local buffer = 10
    local cache_path = TMP_STREAM_PATH .. ".mkv"
    local current_pos = -10000

    createWAV(cache_path, 0)

    while true do
        -- sleep until we reach cache amount
        while true do
            local new_pos = mp.get_property('audio-pts')
            if new_pos - current_pos < chunk_size_sec - 0.5 then
                os.execute("sleep 0.1")
            else
                -- then process a chunk
                current_pos = mp.get_property('audio-pts')
                mp.commandv("dump-cache", current_pos + buffer, current_pos + buffer + chunk_size_sec, cache_path)
                createWAV(cache_path, 0)
                appendSubs((current_pos + buffer) * 1000)
                break
            end
        end
    end
end

This piece of code works on local files and livestreams with some caveats:

If you think this functionality is a nice-to-have I can submit a pull request with a more fleshed out rendition of this idea, otherwise I'm keeping it for myself : )

Thanks

GhostNaN commented 5 months ago

Thanks for letting me know about dump_cache I'm currently looking into implementing this myself, but it's not as straight forward as I was hoping. I'll let you know if I make good progress.

GhostNaN commented 5 months ago

Here's a first attempt that gets the general idea implemented: whispersubs.lua.txt

The new logic is mainly in runCache() with some setup under start() enabled by the global CACHE_MODE This is just temporary, as I plan on replacing all of the streaming code with this eventually. As the original code was just a hack to get around the streaming to a file limitation.