Open raivisdejus opened 7 months ago
I uploaded the model when I tried adding support for the large Whisper models, especially V3.
The lack of support to the large models was also reported earlier, on an issue opened at the end of November. Seems like the issue has been deleted by the person who opened it (I only noticed it now). The issue contained some detailed and relevant information, but it seems like there's not much I can do to restore it.
Anyway, the large models can't be loaded since onnxruntime-node
doesn't support loading models with external data (all models over 2GB).
On 27 November 2023, I opened an issue on the onnxrutime
tracker to request for adding this capability.
I decided to upload the large-v3 Whisper model to the Hugging Face repository anyway, in order to test the download process, and in case anyone else wants to use for it (say, with a different implementation of the ONNX runtime).
Currently, it doesn't work, as you understand. It would only become usable when the issue is fixed in onnxruntime-node
.
Got it! Let's see how this develops and thanks for creating the great echogarden
:)
On v1.0.0
I just released, I added support for whisper.cpp
, which supports large models (large-v1
, large-v2
, large-v3
), quantized models, and GPU acceleration (CUDA and OpenCL).
All whisper.cpp
models (including quantized models) are auto-downloaded. A CUDA build will be auto-downloaded on Windows when you set --enableGPU
, but not on Linux. macOS requires providing a custom executable path for whisper.cpp
, since I don't have a mac and the binary signature requirements are problematic.
I also added support for the OpenAI cloud platform (recognition and synthesis), which provides large-v2
on the cloud, but it's a paid service ($0.006 / minute), and the local whisper.cpp
GPU build can be faster if you have a modern NVIDIA GPU.
Just note the timestamps returned by whisper.cpp
are not as accurate as the ones given by the integrated Whisper implementation (which are now getting close to the accuracy of the official OpenAI reference Python implementation). Also I've spent a lot of effort in the past week trying to improve it further, and to reduce hallucinations and repetition loops.
Running transcription with whisper large model crashes with error
I am running the transcription on Ubuntu Node
v20.3.0
with this commandechogarden transcribe audio.wav audio.txt --language=lv --engine=whisper --whisper.model=large