Open hirendra opened 1 year ago
Try killing ANECompilerService
. Then the model will be loaded immediately and the job will begin.
https://github.com/ggerganov/whisper.cpp/issues/773#issuecomment-1533000588
Killing ANECompilerService works. Is there something with the way it's being called that makes it churn before it realizes that the model is already generated?
As part of running the model inference, I have another script that starts running in the background that waits for the ANECompilerService
to start and then kills it immediately. Kinda wild as far as solutions go. Wonder if there's some way to bypass running this service entirely. Also seen some issues talk about how a Swift(UI) caller can prevent this issue, so I wonder if this would be a valid bodge fix:
Calling client (cpp, nodejs, bash script, etc) --> Swift wrapper --> whisper.cpp CoreML
As part of running the model inference, I have another scripts that starts running in the background that waits for the
ANECompilerService
to start and then kills it immediately. Kinda wild as far as solutions go. Wonder if there's some way to bypass running this service entirely. Also seen some issues talk about how a Swift(UI) caller can prevent this issue, so I wonder if this would be a valid bodge fix:Calling client (cpp, nodejs, bash script, etc) --> Swift wrapper --> whisper.cpp CoreML
Well, I packaged the whisper.cpp and OpenAI whisper inside a perl script in order to call either on a per-file basis. So I’ll try scanning for the ANECompilerService then killing it inside that perl wrapper… but how would we know if it NEEDED to be called vs killed - as in it wasn’t the first run for that language model?
Would running (for instance)
./models/generate-coreml-model.sh modelname
on each model once (and every time a new model was released) ensure we didn’t need to do the first-run compile? @ggerganov
Just wondering. Thanks!
but how would we know if it NEEDED to be called vs killed - as in it wasn’t the first run for that language model?
I thought about this too, but I couldn't reproduce it locally because I don't know where the model cache is - so I can't delete it and test. Presumably it doesn't just modify the file in place?
but how would we know if it NEEDED to be called vs killed - as in it wasn’t the first run for that language model?
I thought about this too, but I couldn't reproduce it locally because I don't know where the model cache is - so I can't delete it and test.
This (plus the hallucinations on long files - thus requiring a re-run) just totally negates any benefit from using CoreML over the normal non-CoreML versions of whisper.cpp - until it is addressed.
@janngobble I see this behavior from non-CoreML built of whisper.cpp too…
./main -m models/ggml-medium.en.bin -f samples/jfk.wav ... ... whisper_init_state: loading Core ML model from 'models/ggml-medium.en-encoder.mlmodelc' whisper_init_state: first run on a device may take a while ...
I ran the above command 3 times and these are the results.
whisper_print_timings: total time = 11890580.00 ms whisper_print_timings: total time = 11944257.00 ms whisper_print_timings: total time = 11783808.00 ms
I'm running this on an M1 Max with 64 GB Ram and Ventura 13.3.1(a) and python 3.10 using conda