echogarden-project / echogarden

Easy-to-use speech toolset. Written in TypeScript. Includes tools for synthesis, recognition, alignment, speech translation, language detection, source separation and more.
GNU General Public License v3.0
170 stars 17 forks source link

Error with whisper large model #34

Open raivisdejus opened 7 months ago

raivisdejus commented 7 months ago

Running transcription with whisper large model crashes with error

Create ONNX inference session for model 'large'.. 2024-01-13 22:41:57.360987848 [E:onnxruntime:, inference_session.cc:1798 operator()] Exception during initialization: /onnxruntime_src/onnxruntime/core/optimizer/initializer.cc:43 onnxruntime::Initializer::Initializer(const onnx::TensorProto&, const onnxruntime::Path&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:789 GetExtDataFromTensorProto External initializer: positional_embedding offset: 0 size to read: 7680000 given file_length: 2293760 are out of bounds or can not be read in full.

Error: Exception during initialization: /onnxruntime_src/onnxruntime/core/optimizer/initializer.cc:43 onnxruntime::Initializer::Initializer(const onnx::TensorProto&, const onnxruntime::Path&) [ONNXRuntimeError] : 1 : FAIL : tensorprotoutils.cc:789 GetExtDataFromTensorProto External initializer: positional_embedding offset: 0 size to read: 7680000 given file_length: 2293760 are out of bounds or can not be read in full.

I am running the transcription on Ubuntu Node v20.3.0 with this command echogarden transcribe audio.wav audio.txt --language=lv --engine=whisper --whisper.model=large

rotemdan commented 7 months ago

I uploaded the model when I tried adding support for the large Whisper models, especially V3.

The lack of support to the large models was also reported earlier, on an issue opened at the end of November. Seems like the issue has been deleted by the person who opened it (I only noticed it now). The issue contained some detailed and relevant information, but it seems like there's not much I can do to restore it.

Anyway, the large models can't be loaded since onnxruntime-node doesn't support loading models with external data (all models over 2GB).

On 27 November 2023, I opened an issue on the onnxrutime tracker to request for adding this capability.

I decided to upload the large-v3 Whisper model to the Hugging Face repository anyway, in order to test the download process, and in case anyone else wants to use for it (say, with a different implementation of the ONNX runtime).

Currently, it doesn't work, as you understand. It would only become usable when the issue is fixed in onnxruntime-node.

raivisdejus commented 7 months ago

Got it! Let's see how this develops and thanks for creating the great echogarden :)

rotemdan commented 4 months ago

On v1.0.0 I just released, I added support for whisper.cpp, which supports large models (large-v1, large-v2, large-v3), quantized models, and GPU acceleration (CUDA and OpenCL).

All whisper.cpp models (including quantized models) are auto-downloaded. A CUDA build will be auto-downloaded on Windows when you set --enableGPU, but not on Linux. macOS requires providing a custom executable path for whisper.cpp, since I don't have a mac and the binary signature requirements are problematic.

I also added support for the OpenAI cloud platform (recognition and synthesis), which provides large-v2 on the cloud, but it's a paid service ($0.006 / minute), and the local whisper.cpp GPU build can be faster if you have a modern NVIDIA GPU.

Just note the timestamps returned by whisper.cpp are not as accurate as the ones given by the integrated Whisper implementation (which are now getting close to the accuracy of the official OpenAI reference Python implementation). Also I've spent a lot of effort in the past week trying to improve it further, and to reduce hallucinations and repetition loops.