Progress reporting on command line tools

Const-me / Whisper

High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

Mozilla Public License 2.0

7.67k stars 665 forks source link

Progress reporting on command line tools #133

Closed manageseverin closed 1 year ago

manageseverin commented 1 year ago

Hello and thanks for the great project. It really is useful and fast. I'm calling the main.exe from a Flask application and serving web requests, but I need to somehow report the progress to users. I don't want to parse the output where subtitles timestamps are printed and figure out the progress based on the current timestamp divided by the sound clip length.

Const-me commented 1 year ago

I recommend against the CLI if you consume the code programmatically.

Instead, integrate either PowerShell module, or the .NET 6 library, or maybe the C++ COM interfaces, whichever is easier for you. You will get proper progress updates in form of callback functions (in Powershell, DataAdded event for the Progress stream), and proper error handling. You will also get a way to keep language models in VRAM, will probably save time and disk bandwidth.

manageseverin commented 1 year ago

BTW, on my RTX 2070 relative speed is about 5.7 on Bulgarian speech + translation. To be honest, loading the model on my PC is pretty fast and I have no complains. I guess I can have multiple translations at the same time using one loaded model ?

Const-me commented 1 year ago

@manageseverin You can, but the feature is not in the examples, needs some programming.

Set the eGpuModelFlags.Cloneable flag when loading the model, create another copy of the model by calling iModel.clone method. This does not duplicate the tensors, it shares the old ones into another D3D11 device backed by the same hardware GPU. This does not duplicate the vocabulary either, that data is kept in std::shared_ptr C++ container so this merely adds another reference to the data already in RAM.

Then you’ll be able to create two iContext objects from these two models, and use them from different CPU threads concurrently to transcribe two files.