Closed manageseverin closed 1 year ago
I recommend against the CLI if you consume the code programmatically.
Instead, integrate either PowerShell module, or the .NET 6 library, or maybe the C++ COM interfaces, whichever is easier for you. You will get proper progress updates in form of callback functions (in Powershell, DataAdded event for the Progress stream), and proper error handling. You will also get a way to keep language models in VRAM, will probably save time and disk bandwidth.
BTW, on my RTX 2070 relative speed is about 5.7 on Bulgarian speech + translation. To be honest, loading the model on my PC is pretty fast and I have no complains. I guess I can have multiple translations at the same time using one loaded model ?
@manageseverin You can, but the feature is not in the examples, needs some programming.
Set the eGpuModelFlags.Cloneable
flag when loading the model, create another copy of the model by calling iModel.clone
method. This does not duplicate the tensors, it shares the old ones into another D3D11 device backed by the same hardware GPU. This does not duplicate the vocabulary either, that data is kept in std::shared_ptr
C++ container so this merely adds another reference to the data already in RAM.
Then you’ll be able to create two iContext
objects from these two models, and use them from different CPU threads concurrently to transcribe two files.
Hello and thanks for the great project. It really is useful and fast. I'm calling the main.exe from a Flask application and serving web requests, but I need to somehow report the progress to users. I don't want to parse the output where subtitles timestamps are printed and figure out the progress based on the current timestamp divided by the sound clip length.