ggerganov / whisper.cpp

Port of OpenAI's Whisper model in C/C++
MIT License
33.52k stars 3.4k forks source link

Illegal instruction (core dumped) #83

Closed dzach closed 1 year ago

dzach commented 1 year ago

Any suggestions? While I get no problems during make, running main or stream with either the tiny or the base model fails with a core dump:

make tiny.en
cc  -O3 -std=c11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -mavx -mavx2 -mfma -mf16c   -c ggml.c
g++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread -c whisper.cpp
g++ -O3 -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -pthread main.cpp whisper.o ggml.o -o main 
./main -h

usage: ./main [options] file0.wav file1.wav ...

options:
  -h,       --help           show this help message and exit
  -s SEED,  --seed SEED      RNG seed (default: -1)
  -t N,     --threads N      number of threads to use during computation (default: 4)
  -ot N,    --offset-t N     time offset in milliseconds (default: 0)
  -on N,    --offset-n N     segment index offset (default: 0)
  -v,       --verbose        verbose output
            --translate      translate from source language to english
  -otxt,    --output-txt     output result in a text file
  -ovtt,    --output-vtt     output result in a vtt file
  -osrt,    --output-srt     output result in a srt file
  -ps,      --print_special  print special tokens
  -pc,      --print_colors   print colors
  -nt,      --no_timestamps  do not print timestamps
  -l LANG,  --language LANG  spoken language (default: en)
  -m FNAME, --model FNAME    model path (default: models/ggml-base.en.bin)
  -f FNAME, --file FNAME     input WAV file path

bash ./download-ggml-model.sh tiny.en
Downloading ggml model tiny.en ...
Model tiny.en already exists. Skipping download.

===============================================
Running tiny.en on all samples in ./samples ...
===============================================

----------------------------------------------
[+] Running base.en on samples/jfk.wav ... (run 'ffplay samples/jfk.wav' to listen)
----------------------------------------------

whisper_model_load: loading model from 'models/ggml-tiny.en.bin'
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 1
whisper_model_load: mem_required  = 390.00 MB
whisper_model_load: adding 1607 extra tokens
whisper_model_load: ggml ctx size =  84.99 MB
Illegal instruction (core dumped)

4 core CPU, running Kubuntu 20.04, 6GB RAM

cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 18
model           : 1
model name      : AMD A6-3650 APU with Radeon(tm) HD Graphics
stepping        : 0
microcode       : 0x3000027
cpu MHz         : 1678.858
cache size      : 1024 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 4
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 6
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate vmmcall arat npt lbrv svm_lock nrip_save pausefilter
bugs            : fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2
bogomips        : 5188.42
TLB size        : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

And also:

gcc -c -Q -march=native --help=target
The following options are target specific:
  -m128bit-long-double                  [enabled]
  -m16                                  [disabled]
  -m32                                  [disabled]
  -m3dnow                               [enabled]
  -m3dnowa                              [enabled]
  -m64                                  [enabled]
  -m80387                               [enabled]
  -m8bit-idiv                           [disabled]
  -m96bit-long-double                   [disabled]
  -mabi=                                sysv
  -mabm                                 [enabled]
  -maccumulate-outgoing-args            [disabled]
  -maddress-mode=                       long
  -madx                                 [disabled]
  -maes                                 [disabled]
  -malign-data=                         compat
  -malign-double                        [disabled]
  -malign-functions=                    0
  -malign-jumps=                        0
  -malign-loops=                        0
  -malign-stringops                     [enabled]
  -mandroid                             [disabled]
  -march=                               amdfam10
  -masm=                                att
  -mavx                                 [disabled]
  -mavx2                                [disabled]
  -mavx256-split-unaligned-load         [disabled]
  -mavx256-split-unaligned-store        [disabled]
  -mavx5124fmaps                        [disabled]
  -mavx5124vnniw                        [disabled]
  -mavx512bitalg                        [disabled]
  -mavx512bw                            [disabled]
  -mavx512cd                            [disabled]
  -mavx512dq                            [disabled]
  -mavx512er                            [disabled]
  -mavx512f                             [disabled]
  -mavx512ifma                          [disabled]
  -mavx512pf                            [disabled]
  -mavx512vbmi                          [disabled]
  -mavx512vbmi2                         [disabled]
  -mavx512vl                            [disabled]
  -mavx512vnni                          [disabled]
  -mavx512vpopcntdq                     [disabled]
  -mbionic                              [disabled]
  -mbmi                                 [disabled]
  -mbmi2                                [disabled]
  -mbranch-cost=<0,5>                   2
  -mcall-ms2sysv-xlogues                [disabled]
  -mcet-switch                          [disabled]
  -mcld                                 [disabled]
  -mcldemote                            [disabled]
  -mclflushopt                          [disabled]
  -mclwb                                [disabled]
  -mclzero                              [disabled]
  -mcmodel=                             [default]
  -mcpu=                      
  -mcrc32                               [disabled]
  -mcx16                                [enabled]
  -mdispatch-scheduler                  [disabled]
  -mdump-tune-features                  [disabled]
  -mf16c                                [disabled]
  -mfancy-math-387                      [enabled]
  -mfentry                              [disabled]
  -mfentry-name=              
  -mfentry-section=           
  -mfma                                 [disabled]
  -mfma4                                [disabled]
  -mforce-drap                          [disabled]
  -mforce-indirect-call                 [disabled]
  -mfp-ret-in-387                       [enabled]
  -mfpmath=                             sse
  -mfsgsbase                            [disabled]
  -mfunction-return=                    keep
  -mfused-madd                
  -mfxsr                                [enabled]
  -mgeneral-regs-only                   [disabled]
  -mgfni                                [disabled]
  -mglibc                               [enabled]
  -mhard-float                          [enabled]
  -mhle                                 [disabled]
  -miamcu                               [disabled]
  -mieee-fp                             [enabled]
  -mincoming-stack-boundary=            0
  -mindirect-branch-register            [disabled]
  -mindirect-branch=                    keep
  -minline-all-stringops                [disabled]
  -minline-stringops-dynamically        [disabled]
  -minstrument-return=                  none
  -mintel-syntax              
  -mlarge-data-threshold=<number>       65536
  -mlong-double-128                     [disabled]
  -mlong-double-64                      [disabled]
  -mlong-double-80                      [enabled]
  -mlwp                                 [disabled]
  -mlzcnt                               [enabled]
  -mmanual-endbr                        [disabled]
  -mmemcpy-strategy=          
  -mmemset-strategy=          
  -mmitigate-rop                        [disabled]
  -mmmx                                 [enabled]
  -mmovbe                               [disabled]
  -mmovdir64b                           [disabled]
  -mmovdiri                             [disabled]
  -mmpx                                 [disabled]
  -mms-bitfields                        [disabled]
  -mmusl                                [disabled]
  -mmwaitx                              [disabled]
  -mno-align-stringops                  [disabled]
  -mno-default                          [disabled]
  -mno-fancy-math-387                   [disabled]
  -mno-push-args                        [disabled]
  -mno-red-zone                         [disabled]
  -mno-sse4                             [enabled]
  -mnop-mcount                          [disabled]
  -momit-leaf-frame-pointer             [disabled]
  -mpc32                                [disabled]
  -mpc64                                [disabled]
  -mpc80                                [disabled]
  -mpclmul                              [disabled]
  -mpcommit                             [disabled]
  -mpconfig                             [disabled]
  -mpku                                 [disabled]
  -mpopcnt                              [enabled]
  -mprefer-avx128             
  -mprefer-vector-width=                none
  -mpreferred-stack-boundary=           0
  -mprefetchwt1                         [disabled]
  -mprfchw                              [enabled]
  -mptwrite                             [disabled]
  -mpush-args                           [enabled]
  -mrdpid                               [disabled]
  -mrdrnd                               [disabled]
  -mrdseed                              [disabled]
  -mrecip                               [disabled]
  -mrecip=                    
  -mrecord-mcount                       [disabled]
  -mrecord-return                       [disabled]
  -mred-zone                            [enabled]
  -mregparm=                            6
  -mrtd                                 [disabled]
  -mrtm                                 [disabled]
  -msahf                                [enabled]
  -msgx                                 [disabled]
  -msha                                 [disabled]
  -mshstk                               [disabled]
  -mskip-rax-setup                      [disabled]
  -msoft-float                          [disabled]
  -msse                                 [enabled]
  -msse2                                [enabled]
  -msse2avx                             [disabled]
  -msse3                                [enabled]
  -msse4                                [disabled]
  -msse4.1                              [disabled]
  -msse4.2                              [disabled]
  -msse4a                               [enabled]
  -msse5                      
  -msseregparm                          [disabled]
  -mssse3                               [disabled]
  -mstack-arg-probe                     [disabled]
  -mstack-protector-guard-offset= 
  -mstack-protector-guard-reg= 
  -mstack-protector-guard-symbol= 
  -mstack-protector-guard=              tls
  -mstackrealign                        [disabled]
  -mstringop-strategy=                  [default]
  -mstv                                 [enabled]
  -mtbm                                 [disabled]
  -mtls-dialect=                        gnu
  -mtls-direct-seg-refs                 [enabled]
  -mtune-ctrl=                
  -mtune=                               amdfam10
  -muclibc                              [disabled]
  -mvaes                                [disabled]
  -mveclibabi=                          [default]
  -mvect8-ret-in-mem                    [disabled]
  -mvpclmulqdq                          [disabled]
  -mvzeroupper                          [enabled]
  -mwaitpkg                             [disabled]
  -mwbnoinvd                            [disabled]
  -mx32                                 [disabled]
  -mxop                                 [disabled]
  -mxsave                               [disabled]
  -mxsavec                              [disabled]
  -mxsaveopt                            [disabled]
  -mxsaves                              [disabled]

  Known assembler dialects (for use with the -masm= option):
    att intel

  Known ABIs (for use with the -mabi= option):
    ms sysv

  Known code models (for use with the -mcmodel= option):
    32 kernel large medium small

  Valid arguments to -mfpmath=:
    387 387+sse 387,sse both sse sse+387 sse,387

  Known indirect branch choices (for use with the -mindirect-branch=/-mfunction-return= options):
    keep thunk thunk-extern thunk-inline

  Known choices for return instrumentation with -minstrument-return=:
    call none nop5

  Known data alignment choices (for use with the -malign-data= option):
    abi cacheline compat

  Known vectorization library ABIs (for use with the -mveclibabi= option):
    acml svml

  Known address mode (for use with the -maddress-mode= option):
    long short

  Known preferred register vector length (to use with the -mprefer-vector-width= option):
    128 256 512 none

  Known stack protector guard (for use with the -mstack-protector-guard= option):
    global tls

  Valid arguments to -mstringop-strategy=:
    byte_loop libcall loop rep_4byte rep_8byte rep_byte unrolled_loop vector_loop

  Known TLS dialects (for use with the -mtls-dialect= option):
    gnu gnu2

  Known valid arguments for -march= option:
    i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake bonnell atom silvermont slm goldmont goldmont-plus tremont knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 btver1 btver2 generic native

  Known valid arguments for -mtune= option:
    generic i386 i486 pentium lakemont pentiumpro pentium4 nocona core2 nehalem sandybridge haswell bonnell silvermont goldmont goldmont-plus tremont knl knm skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake intel geode k6 athlon k8 amdfam10 bdver1 bdver2 bdver3 bdver4 btver1 btver2 znver1 znver2

Thanks!

ggerganov commented 1 year ago

This CPU does not seem to support AVX instruction set:

  -mavx                                 [disabled]
  -mavx2                                [disabled]

You can rerun the make commands without the -mavx -mavx2 -mfma -mf16c flags and it will work. But it will be very slow.

dzach commented 1 year ago

Thanks for the quick response! Indeed, it worked but it's exactly as you said, very slow. I guess I'll have to try with some other CPU or a RPi-4 that I have.