Open dhruv-anand-aintech opened 7 months ago
I'm encountering the same issue with you!
Whisper's CoreML model does not support quantization. See a related question to this issue in https://github.com/ggerganov/whisper.cpp/discussions/1829 discussion. Additionally, https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5212701 and https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5236094 indicate that quantization would not add a performance boost.
For now, the generate-coreml-model
script only generates a CoreML encoder for regular models. The decoder is not generated as running it on the CPU is faster than on ANE, according to https://github.com/ggerganov/whisper.cpp/pull/566 pull request and https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5236094 discussion.
You can use the quantized decoder (large-v3-q5_0
) with the regular CoreML encoder (large-v3
). In fact, whisper.cpp is already looking for the regular CoreML encoder rather than the quantized one, as seen in whisper_get_coreml_path_encoder. A generated regular CoreML encoder is available in ggerganov's Hugging Face repository.
Example command for using the quantized decoder with the regular CoreML encoder:
./main -m models/ggml-large-v3-q5_0.bin -f samples/jfk.wav
Regarding the error message: The reason the generate-coreml-model
script is failing is that convert-whisper-to-coreml
tries to load a PT model, and there isn't one for large-v3-q5_0
. Although there is a ggml_to_pt.py
script, it won't work for the quantized models.
I get the following error when trying to generate the large-v3 quantized coreml model:
I have
ggml-large-v3-q5_0.bin
present in my ./models.Can someone help figure this out? I looked at a related issue (https://github.com/ggerganov/whisper.cpp/issues/1437) for the main large-v3 model, but the error in that is different from mine.