Crash on iPhone when Using CoreML

leohuang2013 commented 1 year ago

Followed instruction in README to convert coreml model. Then tested on macOS, it works perfectly.

Then copy the model to SwiftUI example project, followed by adding WHISPER_USE_COREML processor and coreml source code. Then compile and run on device, it crashes with error: Failure Reason: Message from debugger: Terminated due to memory issue

Debugger output:

whisper_init_from_file_no_state: loading model from '/private/var/containers/Bundle/Application/50F8C5E6-0550-4A36-AA0F-681BAE0531E6/whisper.swiftui.app/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  218.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB
whisper_init_state: loading Core ML model from '/private/var/containers/Bundle/Application/50F8C5E6-0550-4A36-AA0F-681BAE0531E6/whisper.swiftui.app/models/ggml-base.en-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
2023-04-17 09:14:47.251839+0800 whisper.swiftui[28767:2090946] Metal API Validation Enabled

If run this project on macOS, it works.

bjnortier commented 1 year ago

For "small" anything with less memory than an iPhone 12 will likely crash. iPhone 12 is marginal and it helps if you close all other apps.

leohuang2013 commented 1 year ago

Thanks @bjnortier for quick reply. I previously used the code from commit: 09e90680072d8ecdf02eaf21c393218385d2c616

It works perfectly on same iPhone device. Does this means there is much more memory usage since above commit? Is it possible we use same level of memory for CoreML?

bjnortier commented 1 year ago

When you load a CoreML model it is optimised on the device, hence the "first run on a device may take a while ..." output. Afaik this is an internal operation can cannot be pre-computed (e.g. cannot be optimised on another iPhone and then copied over).

This process requires a lot of memory. So if you compile with CoreML, when the model loads for the first time it will consume a lot of memory and might crash, where before it wouldn't for the same iPhone.

I don't understand the question "Is it possible we use same level of memory for CoreML?"

leohuang2013 commented 1 year ago

"When you load a CoreML model it is optimised on the device" - is model optimized saved to local storage, or it is in memory? If answer is the latter one, then every time, I restart app, then it will do optimization again.

"Is it possible we use same level of memory for CoreML?" What I mean is, if normal memory usage for whisper-ggml mode loading is 300+MB, then can we do CoreML model loading with 300+MB also.

If it is impossible, what approximate memory usage for CoreML model loading/optimization?

ggerganov / whisper.cpp

Crash on iPhone when Using CoreML #775