ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.91k stars 9.31k forks source link

Illegal instruction error when running llama-server #8734

Open AngelaZhang3913 opened 1 month ago

AngelaZhang3913 commented 1 month ago

What happened?

I'm trying to run llama-server using ./llama-server -m models/codellama-7b.Q4_K_M.gguf -c 2048 after building it. I'm getting an Illegal Instruction error message.

The illegal instruction is 0x000000000045e3bc in void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, float, float>::gemm<5, 2>(long, long, long, long) ()

It seems to be failing in the AVX512F instruction set. The instruction seems to be in the GEMM function which seems to be a function for matrix multiplication operation in the tinyblas library

Additional context: I've tried doing export CFLAGS="-march=native -mtune=native -mno-avx512f" and export CXXFLAGS="$CFLAGS" already, but it didn't work.

Another person commented that they were having the same issue as me. I was recommended to make a bug report for this.

Here is my lscpu output Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 32 On-line CPU(s) list: 0-31 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 2 NUMA node(s): 2 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz Stepping: 1 CPU MHz: 2016.174 CPU max MHz: 3000.0000 CPU min MHz: 1200.0000 BogoMIPS: 4199.74 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 20480K NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 invpcid_single intel_ppin intel_pt ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d

Name and Version

$ ./llama-server --version version: 3384 (4e24cffd) built with gcc (GCC) 8.3.0 for x86_64-pc-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

#0  0x0000000000445f7c in void (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, float, float>::gemm<5, 2>(long, long, long, long) ()
#1  0x000000000044970e in (anonymous namespace)::tinyBLAS<16, float __vector(16), float __vector(16), unsigned short, float, float>::mnpack(long, long, long, long) ()
#2  0x0000000000456e75 in llamafile_sgemm ()
#3  0x000000000046d183 in ggml_compute_forward_mul_mat ()
#4  0x000000000048d7fd in ggml_graph_compute_thread.constprop.130 ()
#5  0x0000000000491415 in ggml_graph_compute._omp_fn.0 ()
#6  0x00002aaaaac956fe in gomp_thread_start (xdata=<optimized out>)
    at ../.././libgomp/team.c:120
#7  0x00002aaaab21bea5 in start_thread () from /lib64/libpthread.so.0
#8  0x00002aaaab52eb0d in clone () from /lib64/libc.so.6
JohannesGaessler commented 1 month ago

You should be able to fix this by compiling with GGML_NO_LLAMAFILE.

cc: @jart

jart commented 1 month ago

Could you use objdump -d llama-cli to copy and paste me the line of assembly code at the faulting address?

AngelaZhang3913 commented 1 month ago

It seems to have worked with GGML_NO_LLAMAFILE. Thanks a lot!

SnowyCoder commented 1 month ago

Same issue here with Ryzen 7 7735HS

cpuinfo:

CPUID is present
CPU Info for type #0:
------------------
  arch       : x86
  purpose    : general
  vendor_str : `AuthenticAMD'
  vendor id  : 1
  brand_str  : `AMD Ryzen 7 7735HS with Radeon Graphics'
  family     : 15 (0Fh)
  model      : 4 (04h)
  stepping   : 1 (01h)
  ext_family : 25 (19h)
  ext_model  : 68 (44h)
  num_cores  : 8
  num_logical: 16
  tot_logical: 16
  affi_mask  : 0x0000FFFF
  L1 D cache : 32 KB
  L1 I cache : 32 KB
  L2 cache   : 512 KB
  L3 cache   : 16384 KB
  L4 cache   : -1 KB
  L1D assoc. : 8-way
  L1I assoc. : 8-way
  L2 assoc.  : 8-way
  L3 assoc.  : 16-way
  L4 assoc.  : -1-way
  L1D line sz: 64 bytes
  L1I line sz: 64 bytes
  L2 line sz : 64 bytes
  L3 line sz : 64 bytes
  L4 line sz : -1 bytes
  L1D inst.  : 8
  L1I inst.  : 8
  L2 inst.   : 8
  L3 inst.   : 1
  L4 inst.   : 0
  SSE units  : 256 bits (authoritative)
  code name  : `Ryzen 7 (Rembrandt)'
  features   : fpu vme de pse tsc msr pae mce cx8 apic mtrr sep pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht pni pclmul monitor ssse3 cx16 sse4_1 sse4_2 syscall movbe popcnt aes xsave osxsave avx mmxext nx fxsr_opt rdtscp lm lahf_lm cmp_legacy svm abm misalignsse sse4a 3dnowprefetch osvw ibs skinit wdt ts ttp tm_amd hwpstate constant_tsc fma3 f16c rdrand cpb aperfmperf avx2 bmi1 bmi2 sha_ni rdseed adx

The program tries to execute a vmovupd %zmm0,0x13(%rax) that is not supported by my CPU model (it's AVX512F)

qnixsynapse commented 1 month ago

I think it's better to not crash with a Illegal Instruction error which the CPU doesn't support any instruction which llamafile trying to use. In @AngelaZhang3913 's case, the mentioned cpu model is very old which doesn't seem to support either AVX2 or AVX 512 according to Intel.

jart commented 1 month ago

Was this binary compiled on the host system? Chances are the binary was built somewhere else on a machine that has AVX512F and then copied over to the old computer.

SnowyCoder commented 1 month ago

I used the binary packaged by the arch OS: https://archlinux.org/packages/extra/x86_64/ollama/

MatthewCroughan commented 1 month ago

I'm running into this when building the flake.nix on a remote builder. It is an impurity that shouldn't exist in the build.

MatthewCroughan commented 1 month ago

I fixed this when using Nix, by flipping GGML_NATIVE_DEFAULT in CMake. I believe everything is native by default now, but the logic is multiple levels deep between the Nix expression and the Makefiles and their _DEFAULT values. So it's hard to read. It's possible something was flipped by accident due to the rename of LLAMA_NATIVE -> GGML_NATIVE in the past months or so.

The nix expression in .devops/nix/package (why put this stuff in a . directory?) says (cmakeBool "GGML_NATIVE" false), yet despite it setting GGML_NATIVE to false, I still had to flip GGML_NATIVE_DEFAULT, implying a logic error somewhere in the build scripts.

diff --git a/ggml/CMakeLists.txt b/ggml/CMakeLists.txt
index 7fe1661b..363413a9 100644
--- a/ggml/CMakeLists.txt
+++ b/ggml/CMakeLists.txt
@@ -53,7 +53,7 @@ endif()
 if (CMAKE_CROSSCOMPILING)
     set(GGML_NATIVE_DEFAULT OFF)
 else()
-    set(GGML_NATIVE_DEFAULT ON)
+    set(GGML_NATIVE_DEFAULT OFF)
 endif()

 # general
James-E-A commented 2 weeks ago

I'm also getting this error out-of-the-box.

Using NixOS on an i5-3230M (supports AVX but not AVX2), trying to run the flake with nix run.