ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
34.51k stars 3.52k forks source link

Not able to generate core models for Apple Silicon devices #1080

Open vubui opened 1 year ago

vubui commented 1 year ago

I follow the steps here: https://github.com/ggerganov/whisper.cpp#core-ml-support and ran

./models/generate-coreml-model.sh medium

I tried a few times and let it run for about 1 hour but the process typically hung at

Running MIL backend_mlprogram pipeline: 100%|██████████| 10/10 [00:00<00:00, 331.43 passes/s]
image

I don't know enough to dig into the root cause so am hoping someone could illuminate.

nloveladyallen commented 1 year ago

It finished for me, it just takes a long time (multiple hours). An additional progress bar for whatever step is taking so long would be useful.

vadi2 commented 1 year ago

I had it finish for me successfully as well. That said, generating the medium model took 1-2 hours and the first transcription run took 3 hours. Afterwards, any subsequent runs were crazy quick.

dubefab commented 1 year ago

running into same problem, I guess i'll wait and see! the large model must be brutal.

Exfruit commented 1 year ago

hi, I wonder why I keep following the core ML support section but I can't have done. I'm using Apple Silicon M1 and when I try to use

./models/generate-coreml-model.sh base.en

then I got this

(py310-whisper) exfruit@macbook whisper.cpp % ./models/generate-coreml-model.sh base.en
/Users/exfruit/miniconda3/envs/py310-whisper/lib/python3.10/site-packages/whisper/timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
  def backtrace(trace: np.ndarray):
ModelDimensions(n_mels=80, n_audio_ctx=1500, n_audio_state=512, n_audio_head=8, n_audio_layer=6, n_vocab=51864, n_text_ctx=448, n_text_state=512, n_text_head=8, n_text_layer=6)
/Users/exfruit/miniconda3/envs/py310-whisper/lib/python3.10/site-packages/whisper/model.py:166: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape, "incorrect audio shape"
/Users/exfruit/miniconda3/envs/py310-whisper/lib/python3.10/site-packages/whisper/model.py:97: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  scale = (n_state // self.n_head) ** -0.25
Converting PyTorch Frontend ==> MIL Ops: 100%|█████████████████████▉| 531/532 [00:00<00:00, 5504.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|██████████████████████████| 5/5 [00:00<00:00, 545.48 passes/s]
Running MIL default pipeline: 100%|██████████████████████████████████| 57/57 [00:00<00:00, 75.12 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████| 10/10 [00:00<00:00, 1772.37 passes/s]
done converting
coremlc: error: Error reading protobuf spec. validator error: Model specification version field missing or corrupt.
mv: rename models/coreml-encoder-base.en.mlmodelc to models/ggml-base.en-encoder.mlmodelc: No such file or directory

I'm not sure it is about the arm64 or x86_64, but I cannot use the solution of previous issue to solve the problem.

Makefile:32: Your arch is announced as x86_64, but it seems to actually be ARM64. Not fixing that can lead to bad performance. For more info see: https://github.com/ggerganov/whisper.cpp/issues/66#issuecomment-1282546789
I whisper.cpp build info: 
I UNAME_S:  Darwin
I UNAME_P:  i386
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_DARWIN_C_SOURCE -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_DARWIN_C_SOURCE -pthread
I LDFLAGS:   -framework Accelerate
I CC:       Apple clang version 12.0.0 (clang-1200.0.32.29)
I CXX:      Apple clang version 12.0.0 (clang-1200.0.32.29)
fastrick commented 1 year ago

You need to install the Xcode from Apple Store

Exfruit commented 1 year ago

omg I am so sorry that I forgot to update my Xcode for a long while, and now I can use it! Although it took me totally 12 hours to generate the coreml large model and I need to force-quit the AneCompilerService every time when I want to transcribe a video, the speed is faster a lot now! Thank you too much!

AlexandreCassagne commented 11 months ago

@Exfruit Would you be comfortable sharing the file? I have tried to generate but my HDD got filled up (guessing swap memory; 32 GB RAM here)

Exfruit commented 4 months ago

@AlexandreCassagne hey, I'm so sorry that I missed your message for months🥲 if you still need the file I can give you a google drive link or something else cuz this file is just too big