ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
34.26k stars 3.48k forks source link

Unable to generate large-v3 quantized coreml model #2042

Open dhruv-anand-aintech opened 4 months ago

dhruv-anand-aintech commented 4 months ago

I get the following error when trying to generate the large-v3 quantized coreml model:

$ ./models/generate-coreml-model.sh large-v3-q5_0
scikit-learn version 1.3.0 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
XGBoost version 1.7.6 has not been tested with coremltools. You may run into unexpected errors. XGBoost 1.4.2 is the most recent version that has been tested.
Traceback (most recent call last):
  File "/Users/dhruvanand/Code/whisper/whisper.cpp/models/convert-whisper-to-coreml.py", line 293, in <module>
    raise ValueError("Invalid model name")
ValueError: Invalid model name
coremlc: error: Model does not exist at models/coreml-encoder-large-v3-q5_0.mlpackage -- file:///Users/dhruvanand/Code/whisper/whisper.cpp/
mv: rename models/coreml-encoder-large-v3-q5_0.mlmodelc to models/ggml-large-v3-q5_0-encoder.mlmodelc: No such file or directory

I have ggml-large-v3-q5_0.bin present in my ./models.

Can someone help figure this out? I looked at a related issue (https://github.com/ggerganov/whisper.cpp/issues/1437) for the main large-v3 model, but the error in that is different from mine.

ialshjl commented 4 months ago

I'm encountering the same issue with you!

DainisGorbunovs commented 2 months ago

Whisper's CoreML model does not support quantization. See a related question to this issue in https://github.com/ggerganov/whisper.cpp/discussions/1829 discussion. Additionally, https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5212701 and https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5236094 indicate that quantization would not add a performance boost.

For now, the generate-coreml-model script only generates a CoreML encoder for regular models. The decoder is not generated as running it on the CPU is faster than on ANE, according to https://github.com/ggerganov/whisper.cpp/pull/566 pull request and https://github.com/ggerganov/whisper.cpp/discussions/548#discussioncomment-5236094 discussion.

You can use the quantized decoder (large-v3-q5_0) with the regular CoreML encoder (large-v3). In fact, whisper.cpp is already looking for the regular CoreML encoder rather than the quantized one, as seen in whisper_get_coreml_path_encoder. A generated regular CoreML encoder is available in ggerganov's Hugging Face repository.

Example command for using the quantized decoder with the regular CoreML encoder:

./main -m models/ggml-large-v3-q5_0.bin -f samples/jfk.wav

Regarding the error message: The reason the generate-coreml-model script is failing is that convert-whisper-to-coreml tries to load a PT model, and there isn't one for large-v3-q5_0. Although there is a ggml_to_pt.py script, it won't work for the quantized models.