ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
35.16k stars 3.59k forks source link

Compiling on Mac Pro 5.1 - Illegal instruction: 4 #597

Open fedeblock opened 1 year ago

fedeblock commented 1 year ago

It seems there's an error from sysctl which is on the right path: /usr/sbin/sysctl. Make warns about fall back on libraries. I'm not very savvy compiling on mac, clearly.

This is my target platform, make output and error running the example. Any help will be very much appreciated.

High Sierra 10.3.6 2 x Intel(R) Xeon(R) CPU X5675 @ 3.07GHz NVIDIA GeForce GTX 980

sysctl: unknown oid 'hw.optional.arm64' I whisper.cpp build info: I UNAME_S: Darwin I UNAME_P: i386 I UNAME_M: x86_64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mf16c -DGGML_USE_ACCELERATE I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread I LDFLAGS: -framework Accelerate I CC: Apple LLVM version 10.0.0 (clang-1000.11.45.5) I CXX: Apple LLVM version 10.0.0 (clang-1000.11.45.5)

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mf16c -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -c whisper.cpp -o whisper.o c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread examples/main/main.cpp examples/common.cpp ggml.o whisper.o -o main -framework Accelerate ld: warning: text-based stub file /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libQuadrature.tbd and library file /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libQuadrature.dylib are out of sync. Falling back to library file for linking. ./main -h

usage: ./main [options] file0.wav file1.wav ...

options: -h, --help [default] show this help message and exit -t N, --threads N [4 ] number of threads to use during computation -p N, --processors N [1 ] number of processors to use during computation -ot N, --offset-t N [0 ] time offset in milliseconds -on N, --offset-n N [0 ] segment index offset -d N, --duration N [0 ] duration of audio to process in milliseconds -mc N, --max-context N [-1 ] maximum number of text context tokens to store -ml N, --max-len N [0 ] maximum segment length in characters -sow, --split-on-word [false ] split on word rather than on token -bo N, --best-of N [5 ] number of best candidates to keep -bs N, --beam-size N [-1 ] beam size for beam search -wt N, --word-thold N [0.01 ] word timestamp probability threshold -et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail -lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail -su, --speed-up [false ] speed up audio by x2 (reduced accuracy) -tr, --translate [false ] translate from source language to english -di, --diarize [false ] stereo audio diarization -nf, --no-fallback [false ] do not use temperature fallback while decoding -otxt, --output-txt [false ] output result in a text file -ovtt, --output-vtt [false ] output result in a vtt file -osrt, --output-srt [false ] output result in a srt file -owts, --output-words [false ] output script for generating karaoke video -fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video -ocsv, --output-csv [false ] output result in a CSV file -of FNAME, --output-file FNAME [ ] output file path (without file extension) -ps, --print-special [false ] print special tokens -pc, --print-colors [false ] print colors -pp, --print-progress [false ] print progress -nt, --no-timestamps [true ] do not print timestamps -l LANG, --language LANG [en ] spoken language ('auto' for auto-detect) --prompt PROMPT [ ] initial prompt -m FNAME, --model FNAME [models/ggml-base.en.bin] model path -f FNAME, --file FNAME [ ] input WAV file path

./main -f samples/jfk.wav whisper_init_from_file_no_state: loading model from 'models/ggml-base.en.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: f16 = 1 whisper_model_load: type = 2 whisper_model_load: mem required = 215.00 MB (+ 6.00 MB per decoder) whisper_model_load: adding 1607 extra tokens whisper_model_load: model ctx = 140.60 MB Illegal instruction: 4

ggerganov commented 1 year ago

If you remove the following line from the Makefile and run make clean + make does it work:

https://github.com/ggerganov/whisper.cpp/blob/4aa3bcf8a4d18fefab7f72eba0cdad1889dba08a/Makefile#L67

fedeblock commented 1 year ago

Always update before reporting a problem.

Lesson learned, move forward if you have the same error.

removed (line 88 for me) nevertheless, i tried with line 67 removing: CFLAGS += -mavx

        ifneq (,$(findstring f16c,$(F16C_M)))
            CFLAGS += -mf16c
        endif

did

make clean 
rm -f *.o main stream command talk bench libwhisper.a libwhisper.so

Same error, this is the output:

MacPro:whisper.cpp federico$ make sysctl: unknown oid 'hw.optional.arm64' I whisper.cpp build info: I UNAME_S: Darwin I UNAME_P: i386 I UNAME_M: x86_64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mf16c -DGGML_USE_ACCELERATE I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread I LDFLAGS: -framework Accelerate I CC: Apple LLVM version 10.0.0 (clang-1000.11.45.5) I CXX: Apple LLVM version 10.0.0 (clang-1000.11.45.5)

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mf16c -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread -c whisper.cpp -o whisper.o c++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread examples/main/main.cpp examples/common.cpp ggml.o whisper.o -o main -framework Accelerate ld: warning: text-based stub file /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libQuadrature.tbd and library file /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libQuadrature.dylib are out of sync. Falling back to library file for linking. ./main -h

usage: ./main [options] file0.wav file1.wav ...

options: -h, --help [default] show this help message and exit -t N, --threads N [4 ] number of threads to use during computation -p N, --processors N [1 ] number of processors to use during computation -ot N, --offset-t N [0 ] time offset in milliseconds -on N, --offset-n N [0 ] segment index offset -d N, --duration N [0 ] duration of audio to process in milliseconds -mc N, --max-context N [-1 ] maximum number of text context tokens to store -ml N, --max-len N [0 ] maximum segment length in characters -sow, --split-on-word [false ] split on word rather than on token -bo N, --best-of N [5 ] number of best candidates to keep -bs N, --beam-size N [-1 ] beam size for beam search -wt N, --word-thold N [0.01 ] word timestamp probability threshold -et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail -lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail -su, --speed-up [false ] speed up audio by x2 (reduced accuracy) -tr, --translate [false ] translate from source language to english -di, --diarize [false ] stereo audio diarization -nf, --no-fallback [false ] do not use temperature fallback while decoding -otxt, --output-txt [false ] output result in a text file -ovtt, --output-vtt [false ] output result in a vtt file -osrt, --output-srt [false ] output result in a srt file -owts, --output-words [false ] output script for generating karaoke video -fp, --font-path [/System/Library/Fonts/Supplemental/Courier New Bold.ttf] path to a monospace font for karaoke video -ocsv, --output-csv [false ] output result in a CSV file -of FNAME, --output-file FNAME [ ] output file path (without file extension) -ps, --print-special [false ] print special tokens -pc, --print-colors [false ] print colors -pp, --print-progress [false ] print progress -nt, --no-timestamps [true ] do not print timestamps -l LANG, --language LANG [en ] spoken language ('auto' for auto-detect) --prompt PROMPT [ ] initial prompt -m FNAME, --model FNAME [models/ggml-base.en.bin] model path -f FNAME, --file FNAME [ ] input WAV file path

MacPro:whisper.cpp federico$ ./main -f samples/jfk.wav whisper_init_from_file_no_state: loading model from 'models/ggml-base.en.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: f16 = 1 whisper_model_load: type = 2 whisper_model_load: mem required = 215.00 MB (+ 6.00 MB per decoder) whisper_model_load: adding 1607 extra tokens whisper_model_load: model ctx = 140.60 MB Illegal instruction: 4

fedeblock commented 1 year ago

Did a repo pull, changed line 67 and everything worked flawlessly.

This are the benchmark results for jfk.wav if you may be interested:

whisper_init_from_file_no_state: loading model from 'models/ggml-base.en.bin' whisper_model_load: loading model whisper_model_load: n_vocab = 51864 whisper_model_load: n_audio_ctx = 1500 whisper_model_load: n_audio_state = 512 whisper_model_load: n_audio_head = 8 whisper_model_load: n_audio_layer = 6 whisper_model_load: n_text_ctx = 448 whisper_model_load: n_text_state = 512 whisper_model_load: n_text_head = 8 whisper_model_load: n_text_layer = 6 whisper_model_load: n_mels = 80 whisper_model_load: f16 = 1 whisper_model_load: type = 2 whisper_model_load: mem required = 215.00 MB (+ 6.00 MB per decoder) whisper_model_load: adding 1607 extra tokens whisper_model_load: model ctx = 140.60 MB whisper_model_load: model size = 140.54 MB whisper_init_state: kv self size = 5.25 MB whisper_init_state: kv cross size = 17.58 MB

system_info: n_threads = 4 / 24 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 |

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

[00:00:00.000 --> 00:00:11.000] And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.

whisper_print_timings: load time = 1848.29 ms whisper_print_timings: fallbacks = 0 p / 0 h whisper_print_timings: mel time = 70.25 ms whisper_print_timings: sample time = 26.44 ms / 27 runs ( 0.98 ms per run) whisper_print_timings: encode time = 6469.38 ms / 1 runs ( 6469.38 ms per run) whisper_print_timings: decode time = 1000.83 ms / 27 runs ( 37.07 ms per run) whisper_print_timings: total time = 9479.72 ms

Jeff-Prosser commented 1 year ago

I also ran into the same issue and the makefile line 67 fix worked for me as well.

I am noticing that this build of the executable runs considerably slower than my other builds. What exactly does removing line 67 do to the file processing or processor utilization?

fedeblock commented 1 year ago

Happened the same to me. My old MacPro is performing a lot better in ubuntu with Pytorch cuda.

sanchitram1 commented 1 year ago

I'm having the same error. I am trying to package whisper for a package manager tea, and one of the GitHub actions on the repo is to check whether the build works on an intel mac.

Once I fetch the latest release from GitHub, I am applying a patch where I remove line 67, but I still run into this error.

Specs are darwin+x86-64 macos-11.

Any thoughts? Happy to provide more details!

Here is the error output:

Done! Model 'base.en' saved in 'models/ggml-base.en.bin'
You can now use it like this:

  $ ./main -m models/ggml-base.en.bin -f samples/jfk.wav

/Users/runner/.tea/github.com/ggerganov/whisper.cpp/v1.3.0/tbin/models/base.en.bin
whisper_init_from_file_no_state: loading model from '/Users/runner/.tea/github.com/ggerganov/whisper.cpp/v1.3.0/tbin/models/ggml-base.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 2
whisper_model_load: mem required  =  218.00 MB (+    6.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     =  140.60 MB
whisper_model_load: model size    =  140.54 MB
whisper_init_state: kv self size  =    5.25 MB
whisper_init_state: kv cross size =   17.58 MB

system_info: n_threads = 3 / 3 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | COREML = 0 | 

main: processing '/Users/runner/.tea/github.com/ggerganov/whisper.cpp/v1.3.0/share/jfk.wav' (176000 samples, 11.0 sec), 3 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...

/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/afa95bc2/xyz.tea.test.sh: line 24:  3237 Illegal instruction: 4  whisper.main -f /Users/runner/.tea/github.com/ggerganov/whisper.cpp/v1.3.0/share/jfk.wav --print-colors
Error: Process completed with exit code 132.