Closed ensconced closed 10 months ago
I can't reproduce this. If I whip up:
#include "llama.cpp/ggml.h"
float x[10];
ggml_fp16_t y[10];
int main(int argc, char *argv[]) {
ggml_fp32_to_fp16_row(x, y, 10);
}
And run it with qemu-x86_64 -cpu core2duo
then it doesn't SIGILL. The code in question is here:
As we can see, it checks CPUID before using the instructions your CPU doesn't support. So I'd assume the only way this could happen, would be if your hypervisor is buggy and reports F16C support is available when it actually isn't. I'm reasonably certain we're not at fault here.
In order to continue troubleshooting this, could you please:
gdb o//llama.cpp/main/main -ex 'set args -m weights.gguf etc.'
and type layout asm
and then run
to get me the specific assembly code lines that are faulting, and ideally their address too.here is the output from gdb o/llama.cpp/main/main.com.dbg -ex 'set args -m ../mistral.gguf'
(I wasn't able to find the responsible asm line using the exact command you suggested)
cpuid output:
GenuineIntel Kabylake (Client Grade)
frequency 3500mhz
turbo 3500mhz
bus 100mhz
kX86CpuFamily 0x6
kX86CpuModel 0x9e
kX86CpuStepping 0x9
kX86CpuType 0
kX86CpuModelid 0xe
kX86CpuExtmodelid 0x9
kX86CpuFamilyid 0x6
kX86CpuExtfamilyid 0
TSC_AUX 0
→ core 0
→ node 0
Caches
──────
Level 1 data 8-way 32,768 byte cache w/ 64 sets of 64 byte lines shared across 2 threads
Level 1 code 8-way 32,768 byte cache w/ 64 sets of 64 byte lines shared across 2 threads
Level 2 4-way 262,144 byte cache w/ 1,024 sets of 64 byte lines shared across 2 threads
Level 3 complexly-indexed 12-way 3,145,728 byte cache w/ 4,096 sets of 64 byte lines shared across 16 threads
Features
────────
ACC present
ACPI present
ADX unavailable
AES present
APIC present
ARCH_CAPABILITIES unavailable
AVX unavailable
AVX2 unavailable
AVXVNNI unavailable
AVX512BW unavailable
AVX512CD unavailable
AVX512DQ unavailable
AVX512ER unavailable
AVX512F unavailable
AVX512IFMA unavailable
AVX512PF unavailable
AVX512VBMI unavailable
AVX512VL unavailable
AVX512_4FMAPS unavailable
AVX512_4VNNIW unavailable
AVX512_BF16 unavailable
AVX512_BITALG unavailable
AVX512_VBMI2 unavailable
AVX512_VNNI unavailable
AVX512_VP2INTERSECT unavailable
AVX512_VPOPCNTDQ unavailable
BMI unavailable
BMI2 unavailable
CID unavailable
CLDEMOTE unavailable
CLFLUSH present
CLFLUSHOPT present
CLWB unavailable
CMOV present
CQM unavailable
CX16 present
CX8 present
DCA unavailable
DE present
DS present
DSCPL present
DTES64 present
ERMS present
EST present
F16C unavailable
FDP_EXCPTN_ONLY unavailable
FLUSH_L1D unavailable
FMA unavailable
FPU present
FSGSBASE present
FXSR present
GBPAGES present
GFNI unavailable
HLE unavailable
HT present
HYPERVISOR unavailable
IA64 unavailable
INTEL_PT present
INTEL_STIBP unavailable
INVPCID unavailable
LA57 unavailable
LM present
MCA present
MCE present
MD_CLEAR unavailable
MMX present
MOVBE present
MOVDIR64B unavailable
MOVDIRI unavailable
MP unavailable
MPX present
MSR present
MTRR present
MWAIT present
NX present
OSPKE unavailable
OSXSAVE present
PAE present
PAT present
PBE present
PCID present
PCLMUL present
PCONFIG unavailable
PDCM present
PGE present
PKU unavailable
PN unavailable
POPCNT present
PSE present
PSE36 present
RDPID unavailable
RDRND present
RDSEED present
RDTSCP present
RDT_A unavailable
RTM unavailable
SDBG present
SELFSNOOP present
SEP present
SHA unavailable
SMAP present
SMEP present
SMX unavailable
SPEC_CTRL unavailable
SPEC_CTRL_SSBD unavailable
SSE present
SSE2 present
SSE3 present
SSE4_1 present
SSE4_2 present
SSSE3 present
SYSCALL present
TM2 present
TME unavailable
TSC present
TSC_ADJUST present
TSC_DEADLINE_TIMER present
TSX_FORCE_ABORT unavailable
UMIP unavailable
VAES unavailable
VME present
VMX present
VPCLMULQDQ unavailable
WAITPKG unavailable
X2APIC present
XSAVE present
XTPR present
ZERO_FCS_FDS present
AMD Stuff
─────────
3DNOW unavailable
3DNOWEXT unavailable
3DNOWPREFETCH present
ABM present
BPEXT unavailable
CMP_LEGACY unavailable
CR8_LEGACY unavailable
EXTAPIC unavailable
FMA4 unavailable
FXSR_OPT unavailable
IBS unavailable
LAHF_LM present
LWP unavailable
MISALIGNSSE unavailable
MMXEXT unavailable
MWAITX unavailable
NODEID_MSR unavailable
OSVW unavailable
OVERFLOW_RECOV unavailable
PERFCTR_CORE unavailable
PERFCTR_LLC unavailable
PERFCTR_NB unavailable
PTSC unavailable
SKINIT unavailable
SMCA unavailable
SSE4A unavailable
SUCCOR unavailable
SVM unavailable
TBM unavailable
TCE unavailable
TOPOEXT unavailable
WDT unavailable
XOP unavailable
similar result here.
keen@bee:~/llamafile$ ./llava-v1.5-7b-q4-server.llamafile --ftrace
...
FUN 2294950 2294966 154'993'012'634 576 &ggml_get_n_tasks
FUN 2294950 2294967 154'993'029'990 944 &ggml_metal_supported
FUN 2294950 2294967 154'993'032'040 960 &cosmo_once
FUN 2294950 2294966 154'993'032'230 576 &ggml_compute_forward.part.0
FUN 2294950 2294967 154'993'033'544 944 &ggml_cublas_loaded
FUN 2294950 2294950 154'993'022'194 -123'126'943'063'576 &cosmo_once
FUN 2294950 2294950 154'993'108'048 -123'126'943'063'592 &ggml_compute_forward_dup
FUN 2294950 2294965 154'993'024'067 944 &ggml_metal_supported
FUN 2294950 2294950 154'993'111'218 -123'126'943'063'304 &ggml_fp32_to_fp16_row
FUN 2294950 2294966 154'993'034'450 944 &ggml_metal_supported
FUN 2294950 2294965 154'993'114'109 960 &cosmo_once
FUN 2294950 2294967 154'993'101'492 960 &cosmo_once
FUN 2294950 2294965 154'993'123'331 944 &ggml_cublas_loaded
FUN 2294950 2294965 154'993'126'456 960 &cosmo_once
FUN 2294950 2294966 154'993'120'953 960 &cosmo_once
FUN 2294950 2294950 154'993'132'658 -123'126'943'062'120 &__oncrash
FUN 2294950 2294966 154'993'135'157 944 &ggml_cublas_loaded
FUN 2294950 2294967 154'993'126'560 944 &ggml_compute_forward_dup
FUN 2294950 2294950 154'993'135'148 -123'126'943'062'072 &__errno_location
FUN 2294950 2294966 154'993'137'062 960 &cosmo_once
FUN 2294950 2294967 154'993'139'167 1'232 &ggml_fp32_to_fp16_row
FUN 2294950 2294965 154'993'131'725 944 &ggml_compute_forward_dup
FUN 2294950 2294950 154'993'139'486 -123'126'943'062'072 &IsDebuggerPresent
FUN 2294950 2294965 154'993'146'275 1'232 &ggml_fp32_to_fp16_row
FUN 2294950 2294950 154'993'148'912 -123'126'943'060'968 &__errno_location
FUN 2294950 2294966 154'993'140'142 944 &ggml_compute_forward_dup
FUN 2294950 2294950 154'993'151'746 -123'126'943'060'968 &_pthread_block_cancelation
FUN 2294950 2294950 154'993'159'880 -123'126'943'060'968 &__sys_openat
FUN 2294950 2294966 154'993'158'234 1'232 &ggml_fp32_to_fp16_row
Illegal instruction (core dumped)
keen@bee:~/llamafile$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Celeron(R) N5095 @ 2.00GHz
CPU family: 6
Model: 156
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 0
CPU max MHz: 2900.0000
CPU min MHz: 800.0000
BogoMIPS: 3993.60
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
ht tm pbe syscall nx rdtscp lm constant_tsc art arch_pe
rfmon pebs bts rep_good nopl xtopology nonstop_tsc cpui
d aperfmperf tsc_known_freq pni pclmulqdq dtes64 monito
r ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 s
se4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
rdrand lahf_lm 3dnowprefetch cpuid_fault epb cat_l2 cdp
_l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi
flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep e
rms rdt_a rdseed smap clflushopt clwb intel_pt sha_ni x
saveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
hwp_pkg_req umip waitpkg gfni rdpid movdiri movdir64b m
d_clear flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 1.5 MiB (1 instance)
L3: 4 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerabilities:
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Mitigation; Clear CPU buffers; SMT disabled
Retbleed: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
and seccomp
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer
sanitization
Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB fillin
g, PBRSB-eIBRS Not affected
Srbds: Vulnerable: No microcode
Tsx async abort: Not affected
Plus one:
$ ./mistral-7b-instruct-server.llamafile --ftrace
FUN 2712860 2712915 28'046'631'127 944 &ggml_metal_supported
FUN 2712860 2712860 28'046'620'106 -123'137'703'420'504 &ggml_compute_forward.part.0
FUN 2712860 2712860 28'046'727'728 -123'137'703'420'136 &ggml_metal_supported
FUN 2712860 2712860 28'046'729'536 -123'137'703'420'120 &cosmo_once
FUN 2712860 2712860 28'046'730'849 -123'137'703'420'136 &ggml_cublas_loaded
FUN 2712860 2712860 28'046'732'110 -123'137'703'420'120 &cosmo_once
FUN 2712860 2712916 28'046'631'282 576 &ggml_compute_forward.part.0
FUN 2712860 2712860 28'046'734'079 -123'137'703'420'136 &ggml_compute_forward_dup
FUN 2712860 2712860 28'046'738'463 -123'137'703'419'848 &ggml_fp32_to_fp16_row
FUN 2712860 2712916 28'046'738'226 944 &ggml_metal_supported
FUN 2712860 2712916 28'046'741'931 960 &cosmo_once
FUN 2712860 2712916 28'046'745'814 944 &ggml_cublas_loaded
FUN 2712860 2712916 28'046'749'080 960 &cosmo_once
FUN 2712860 2712916 28'046'750'874 944 &ggml_compute_forward_dup
FUN 2712860 2712916 28'046'754'283 1'232 &ggml_fp32_to_fp16_row
FUN 2712860 2712860 28'046'758'872 -123'137'703'418'152 &__oncrash
FUN 2712860 2712860 28'046'762'429 -123'137'703'418'104 &__errno_location
FUN 2712860 2712860 28'046'769'785 -123'137'703'418'104 &IsDebuggerPresent
FUN 2712860 2712860 28'046'776'415 -123'137'703'417'000 &__errno_location
FUN 2712860 2712860 28'046'782'505 -123'137'703'417'000 &_pthread_block_cancelation
FUN 2712860 2712860 28'046'788'948 -123'137'703'417'000 &__sys_openat
Illegal instruction (core dumped)
GenuineIntel Goldmont (Appliance Grade)
kX86CpuFamily 0x6
kX86CpuModel 0x5c
kX86CpuStepping 0x9
kX86CpuType 0
kX86CpuModelid 0xc
kX86CpuExtmodelid 0x5
kX86CpuFamilyid 0x6
kX86CpuExtfamilyid 0
TSC_AUX 0
→ core 0
→ node 0
Caches
──────
Level 1 data 6-way 24,576 byte cache w/ 64 sets of 64 byte lines shared across 1 threads
Level 1 code 8-way 32,768 byte cache w/ 64 sets of 64 byte lines shared across 1 threads
Level 2 16-way 1,048,576 byte cache w/ 1,024 sets of 64 byte lines shared across 4 threads
Features
────────
ACC present
ACPI present
ADX unavailable
AES present
APIC present
ARCH_CAPABILITIES present
AVX unavailable
AVX2 unavailable
AVXVNNI unavailable
AVX512BW unavailable
AVX512CD unavailable
AVX512DQ unavailable
AVX512ER unavailable
AVX512F unavailable
AVX512IFMA unavailable
AVX512PF unavailable
AVX512VBMI unavailable
AVX512VL unavailable
AVX512_4FMAPS unavailable
AVX512_4VNNIW unavailable
AVX512_BF16 unavailable
AVX512_BITALG unavailable
AVX512_VBMI2 unavailable
AVX512_VNNI unavailable
AVX512_VP2INTERSECT unavailable
AVX512_VPOPCNTDQ unavailable
BMI unavailable
BMI2 unavailable
CID unavailable
CLDEMOTE unavailable
CLFLUSH present
CLFLUSHOPT present
CLWB unavailable
CMOV present
CQM unavailable
CX16 present
CX8 present
DCA unavailable
DE present
DS present
DSCPL present
DTES64 present
ERMS present
EST present
F16C unavailable
FDP_EXCPTN_ONLY unavailable
FLUSH_L1D unavailable
FMA unavailable
FPU present
FSGSBASE present
FXSR present
GBPAGES present
GFNI unavailable
HLE unavailable
HT present
HYPERVISOR unavailable
IA64 unavailable
INTEL_PT present
INTEL_STIBP present
INVPCID unavailable
LA57 unavailable
LM present
MCA present
MCE present
MD_CLEAR present
MMX present
MOVBE present
MOVDIR64B unavailable
MOVDIRI unavailable
MP unavailable
MPX present
MSR present
MTRR present
MWAIT present
NX present
OSPKE unavailable
OSXSAVE present
PAE present
PAT present
PBE present
PCID unavailable
PCLMUL present
PCONFIG unavailable
PDCM present
PGE present
PKU unavailable
PN unavailable
POPCNT present
PSE present
PSE36 present
RDPID unavailable
RDRND present
RDSEED present
RDTSCP present
RDT_A present
RTM unavailable
SDBG present
SELFSNOOP present
SEP present
SHA present
SMAP present
SMEP present
SMX unavailable
SPEC_CTRL present
SPEC_CTRL_SSBD unavailable
SSE present
SSE2 present
SSE3 present
SSE4_1 present
SSE4_2 present
SSSE3 present
SYSCALL present
TM2 present
TME unavailable
TSC present
TSC_ADJUST present
TSC_DEADLINE_TIMER present
TSX_FORCE_ABORT unavailable
UMIP unavailable
VAES unavailable
VME present
VMX present
VPCLMULQDQ unavailable
WAITPKG unavailable
X2APIC present
XSAVE present
XTPR present
ZERO_FCS_FDS present
AMD Stuff
─────────
3DNOW unavailable
3DNOWEXT unavailable
3DNOWPREFETCH present
ABM unavailable
BPEXT unavailable
CMP_LEGACY unavailable
CR8_LEGACY unavailable
EXTAPIC unavailable
FMA4 unavailable
FXSR_OPT unavailable
IBS unavailable
LAHF_LM present
LWP unavailable
MISALIGNSSE unavailable
MMXEXT unavailable
MWAITX unavailable
NODEID_MSR unavailable
OSVW unavailable
OVERFLOW_RECOV unavailable
PERFCTR_CORE unavailable
PERFCTR_LLC unavailable
PERFCTR_NB unavailable
PTSC unavailable
SKINIT unavailable
SMCA unavailable
SSE4A unavailable
SUCCOR unavailable
SVM unavailable
TBM unavailable
TCE unavailable
TOPOEXT unavailable
WDT unavailable
XOP unavailable
Same here: on Intel(R) Core(TM) i5 CPU M 460 @ 2.53GHz
, OS: Ubuntu 22.04.3 LTS
log ending when running executable:
./llava-v1.5-7b-q4-server.llamafile
...
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.12 MiB
llm_load_tensors: mem required = 3891.36 MiB
..................................................................................................
llama_new_context_with_model: n_ctx = 2048
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 1024.00 MiB, K (f16): 512.00 MiB, V (f16): 512.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 159.32 MiB
Illegal instruction (core dumped)
Taking a closer look at the implementation, I think this is happening because ggml_compute_fp32_to_fp16()
is declared as inline. The fix should be pretty simple.
The server crashes with "illegal instruction" on my machine. I tried running as
./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile --ftrace
- these were the last few lines:output of
lscpu
: