coreml medium.en model takes a very long time to run every time

hirendra commented 1 year ago

./main -m models/ggml-medium.en.bin -f samples/jfk.wav ... ... whisper_init_state: loading Core ML model from 'models/ggml-medium.en-encoder.mlmodelc' whisper_init_state: first run on a device may take a while ...

I ran the above command 3 times and these are the results.
whisper_print_timings: total time = 11890580.00 ms whisper_print_timings: total time = 11944257.00 ms whisper_print_timings: total time = 11783808.00 ms

I'm running this on an M1 Max with 64 GB Ram and Ventura 13.3.1(a) and python 3.10 using conda

hoonlight commented 1 year ago

Try killing ANECompilerService. Then the model will be loaded immediately and the job will begin.

https://github.com/ggerganov/whisper.cpp/issues/773#issuecomment-1533000588

janngobble commented 1 year ago

Killing ANECompilerService works. Is there something with the way it's being called that makes it churn before it realizes that the model is already generated?

sam3d commented 1 year ago

As part of running the model inference, I have another script that starts running in the background that waits for the ANECompilerService to start and then kills it immediately. Kinda wild as far as solutions go. Wonder if there's some way to bypass running this service entirely. Also seen some issues talk about how a Swift(UI) caller can prevent this issue, so I wonder if this would be a valid bodge fix:

Calling client (cpp, nodejs, bash script, etc) --> Swift wrapper --> whisper.cpp CoreML

janngobble commented 1 year ago

As part of running the model inference, I have another scripts that starts running in the background that waits for the ANECompilerService to start and then kills it immediately. Kinda wild as far as solutions go. Wonder if there's some way to bypass running this service entirely. Also seen some issues talk about how a Swift(UI) caller can prevent this issue, so I wonder if this would be a valid bodge fix:

Calling client (cpp, nodejs, bash script, etc) --> Swift wrapper --> whisper.cpp CoreML

Well, I packaged the whisper.cpp and OpenAI whisper inside a perl script in order to call either on a per-file basis. So I’ll try scanning for the ANECompilerService then killing it inside that perl wrapper… but how would we know if it NEEDED to be called vs killed - as in it wasn’t the first run for that language model?

Would running (for instance)

./models/generate-coreml-model.sh modelname

on each model once (and every time a new model was released) ensure we didn’t need to do the first-run compile? @ggerganov

Just wondering. Thanks!

sam3d commented 1 year ago

but how would we know if it NEEDED to be called vs killed - as in it wasn’t the first run for that language model?

I thought about this too, but I couldn't reproduce it locally because I don't know where the model cache is - so I can't delete it and test. Presumably it doesn't just modify the file in place?

janngobble commented 1 year ago

but how would we know if it NEEDED to be called vs killed - as in it wasn’t the first run for that language model?

I thought about this too, but I couldn't reproduce it locally because I don't know where the model cache is - so I can't delete it and test.

This (plus the hallucinations on long files - thus requiring a re-run) just totally negates any benefit from using CoreML over the normal non-CoreML versions of whisper.cpp - until it is addressed.

Sponge-bink commented 1 year ago

@janngobble I see this behavior from non-CoreML built of whisper.cpp too…

ggerganov / whisper.cpp

coreml medium.en model takes a very long time to run every time #937