CoreML on M1 gives a wrong transcription

nbarrera commented 9 months ago

Hi there I am following instructions to get CoreML working on Apple Silicon M1.

after I get everything going and trying to transcribe the jfk sample, I only get a wrong transcription:

[00:00:00.000 --> 00:00:30.000]   " in "

while the correct output is:

[00:00:00.300 --> 00:00:09.180]   And so, my fellow Americans, ask not what your country can do for you, ask what you
[00:00:09.180 --> 00:00:11.000]   can do for your country.

I think I did everything as instructed (python 3.10, miniconda, installed the packages) but I am very new to all this AI thing.

The regular (not CoreML) model is working perfectly for me, I am just trying to see if I get a better performance out of my M1 chip.

Thank you in advance, Nicolas.

(here's an excerpt of the output)

./main -m models/ggml-large-v3.bin -f samples/jfk.wav
...
whisper_init_state: loading Core ML model from 'models/ggml-large-v3-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     8.80 MiB, ( 3412.97 / 10922.67)
whisper_init_state: compute buffer (conv)   =   10.92 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =     7.33 MiB, ( 3420.30 / 10922.67)
whisper_init_state: compute buffer (cross)  =    9.38 MB
ggml_backend_metal_buffer_type_alloc_buffer: allocated buffer, size =   197.95 MiB, ( 3618.25 / 10922.67)
whisper_init_state: compute buffer (decode) =  209.26 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | CUDA = 0 | COREML = 1 | OPENVINO = 0 | 

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:30.000]   " in "

whisper_print_timings:     load time =  1128.05 ms
whisper_print_timings:     fallbacks =   1 p /   0 h
whisper_print_timings:      mel time =     7.88 ms
whisper_print_timings:   sample time =    53.50 ms /    45 runs (    1.19 ms per run)
whisper_print_timings:   encode time =  1311.38 ms /     1 runs ( 1311.38 ms per run)
whisper_print_timings:   decode time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:   batchd time =   798.65 ms /    41 runs (   19.48 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time =  8631.10 ms
ggml_metal_free: deallocating

gavin1818 commented 9 months ago

Im having the same issue, The regular (not CoreML) model is working for me,but the CoreML is giving wrong transcription.

gavin1818 commented 8 months ago

@nbarrera I solved the issue after updating to the latest Macos 14.3.1

nbarrera commented 8 months ago

Thank you I thought about updating at one time, but I am very reluctant to update...

But I do have a good reason to do it now, so well

Will find sometime to update and test again so I can close the issue next week

Thank you!!

ggerganov / whisper.cpp

CoreML on M1 gives a wrong transcription #1901