Closed maxlund closed 2 months ago
Hi @maxlund, this is unlikely to be related to the model being pre-downloaded. The additional file you observed is the system cache that Core ML has to generate so the generic model is specialized to your device's chip generation and the selected compute unit. When you first load the model, the cache is generated which takes a long time. We do have a solution coming up for this so stay tuned. This cache can not be pre-populated with publicly available APIs.
Thank you for the clarification! Any idea on how different those files are from machine to machine? Could we pre-create the files for e.g. the distil large-v3 model, and bundle these files in an application to run on macOS?
Hi @atiorh ! Following on from Max's questions...
The additional file you observed is the system cache that Core ML has to generate so the generic model is specialized to your device's chip generation and the selected compute unit. When you first load the model, the cache is generated which takes a long time.
Are you able to provide give some more information/insight into this? Does this process happen faster on some machines than others? Does it only happen on macOS 15 and above?
We do have a solution coming up for this so stay tuned.
Are there any WIP branches I could play with? Is there a rough ETA of when this could be improved - weeks or months?
This cache can not be pre-populated with publicly available APIs.
Is this because it's proprietary CoreML magic that's happening under the macOS hood?
Thanks heaps for your time!
...and a dumb question... whilst the CoreML cache is being generated, would it be possible to just do WhisperKit processing on the CPU & GPU, until the CoreML cache is ready, then start using the Apple Neural Engine/NPU?
Playing around with @ZachNagengast's branches - it looks like MLX might solve this?
@atiorh I am unfortunately experiencing issues where this delay is happening more than once for the same model, even if it is located at the same path as previously etc. After a reboot we see ANECompilerService taking up a lot of resources and the model taking ~9min to produce a transcription result. Any guidance on possible ways to mitigate this would be much appreciated!
MLX definitely helps with this as long as you can spare the memory to load both models at the same time. We also rolled out some experimental changes in the latest testflight, try it out and see how the speeds compare, it's enabled in the experimental section in the settings.
Hi!
When using
--model-path
to a pre-downloaded model, there seems to be something that makes CLI calls stall:As you can see it took almost 9 minutes to initialise the model.
The second time we run the CLI call however, it's fast:
It seems files are still being created in
~/Library/Caches/whisperkit-cli
and it seems to be 1.5gb large even though I specified the--model-path
. I suspect this has something to do with it?Is there anything you can pre-download or files you can pre-populate in order to make startup faster (and possible) in a completely offline environment?