Mozilla-Ocho / llamafile

Distribute and run LLMs with a single file.
https://llamafile.ai
Other
20.33k stars 1.02k forks source link

server crashes with "illegal instruction" on Intel Pentium #90

Closed ensconced closed 10 months ago

ensconced commented 11 months ago

The server crashes with "illegal instruction" on my machine. I tried running as ./mistral-7b-instruct-v0.1-Q4_K_M-server.llamafile --ftrace - these were the last few lines:

FUN   1186   1193     12'910'697'524   1'488         &sincosf
FUN   1186   1193     12'910'721'920   1'440       &rope_yarn
FUN   1186   1193     12'910'739'450   1'488         &sincosf
FUN   1186   1193     12'910'756'421   1'440       &rope_yarn
FUN   1186   1193     12'910'771'800   1'488         &sincosf
FUN   1186   1193     12'910'780'829   1'440       &rope_yarn
FUN   1186   1193     12'910'797'442   1'488         &sincosf
FUN   1186   1193     12'910'813'827   1'440       &rope_yarn
FUN   1186   1193     12'910'821'714   1'488         &sincosf
FUN   1186   1193     12'910'838'144   1'440       &rope_yarn
FUN   1186   1193     12'910'854'546   1'488         &sincosf
FUN   1186   1193     12'910'869'678   1'440       &rope_yarn
FUN   1186   1193     12'910'878'743   1'488         &sincosf
FUN   1186   1193     12'910'895'688     592   &ggml_cuda_compute_forward
FUN   1186   1193     12'910'912'564     624     &cosmo_once
FUN   1186   1193     12'910'927'982     592   &ggml_compute_forward.part.0
FUN   1186   1193     12'910'937'523     592   &ggml_cuda_compute_forward
FUN   1186   1193     12'910'950'949     624     &cosmo_once
FUN   1186   1194     12'910'937'843     592   &ggml_cuda_compute_forward
FUN   1186   1193     12'910'954'772     592   &ggml_compute_forward.part.0
FUN   1186   1193     12'910'960'021     976     &ggml_compute_forward_dup
FUN   1186   1194     12'910'960'306     624     &cosmo_once
FUN   1186   1193     12'910'965'286   1'264       &ggml_fp32_to_fp16_row
FUN   1186   1194     12'910'965'862     592   &ggml_compute_forward.part.0
FUN   1186   1194     12'910'971'276     976     &ggml_compute_forward_dup
FUN   1186   1194     12'910'974'922   1'264       &ggml_fp32_to_fp16_row
Illegal instruction

output of lscpu:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Pentium(R) CPU G4560 @ 3.50GHz
    CPU family:          6
    Model:               158
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            9
    CPU(s) scaling MHz:  23%
    CPU max MHz:         3500.0000
    CPU min MHz:         800.0000
    BogoMIPS:            6999.82
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
                          tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc 
                         cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm pcid sse4_1 sse4_2 x2
                         apic movbe popcnt aes xsave rdrand lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti tpr_shadow vnmi flex
                         priority ept vpid ept_ad fsgsbase tsc_adjust smep erms invpcid mpx rdseed smap clflushopt intel_pt xsaveopt xsav
                         ec xgetbv1 xsaves dtherm arat pln pts hwp hwp_notify hwp_act_window hwp_epp
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   64 KiB (2 instances)
  L1i:                   64 KiB (2 instances)
  L2:                    512 KiB (2 instances)
  L3:                    3 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Gather data sampling:  Not affected
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Mmio stale data:       Vulnerable: Clear CPU buffers attempted, no microcode; SMT vulnerable
  Retbleed:              Vulnerable
  Spec rstack overflow:  Not affected
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Vulnerable: No microcode
  Tsx async abort:       Not affected
jart commented 11 months ago

I can't reproduce this. If I whip up:

#include "llama.cpp/ggml.h"

float x[10];
ggml_fp16_t y[10];

int main(int argc, char *argv[]) {
  ggml_fp32_to_fp16_row(x, y, 10);
}

And run it with qemu-x86_64 -cpu core2duo then it doesn't SIGILL. The code in question is here:

https://github.com/Mozilla-Ocho/llamafile/blob/c236a71d19cc5947dc1dee896b5982535e41e315/llama.cpp/ggml.c#L189-L217

As we can see, it checks CPUID before using the instructions your CPU doesn't support. So I'd assume the only way this could happen, would be if your hypervisor is buggy and reports F16C support is available when it actually isn't. I'm reasonably certain we're not at fault here.

In order to continue troubleshooting this, could you please:

  1. Build llamafile from source and run gdb o//llama.cpp/main/main -ex 'set args -m weights.gguf etc.' and type layout asm and then run to get me the specific assembly code lines that are faulting, and ideally their address too.
  2. Download and run the program https://cosmo.zip/pub/cosmos/bin/cpuid or your system and post its full output here.
ensconced commented 11 months ago

here is the output from gdb o/llama.cpp/main/main.com.dbg -ex 'set args -m ../mistral.gguf' (I wasn't able to find the responsible asm line using the exact command you suggested)

Screenshot 2023-12-13 at 12 24 08

cpuid output:

GenuineIntel Kabylake (Client Grade)

frequency           3500mhz
turbo               3500mhz
bus                 100mhz

kX86CpuFamily       0x6
kX86CpuModel        0x9e

kX86CpuStepping     0x9
kX86CpuType         0
kX86CpuModelid      0xe
kX86CpuExtmodelid   0x9
kX86CpuFamilyid     0x6
kX86CpuExtfamilyid  0

TSC_AUX             0
 → core             0
 → node             0

Caches
──────
Level 1 data        8-way  32,768 byte cache w/    64 sets of 64 byte lines shared across 2 threads
Level 1 code        8-way  32,768 byte cache w/    64 sets of 64 byte lines shared across 2 threads
Level 2             4-way 262,144 byte cache w/ 1,024 sets of 64 byte lines shared across 2 threads
Level 3             complexly-indexed 12-way 3,145,728 byte cache w/ 4,096 sets of 64 byte lines shared across 16 threads

Features
────────
ACC                 present
ACPI                present
ADX                 unavailable
AES                 present
APIC                present
ARCH_CAPABILITIES   unavailable
AVX                 unavailable
AVX2                unavailable
AVXVNNI             unavailable
AVX512BW            unavailable
AVX512CD            unavailable
AVX512DQ            unavailable
AVX512ER            unavailable
AVX512F             unavailable
AVX512IFMA          unavailable
AVX512PF            unavailable
AVX512VBMI          unavailable
AVX512VL            unavailable
AVX512_4FMAPS       unavailable
AVX512_4VNNIW       unavailable
AVX512_BF16         unavailable
AVX512_BITALG       unavailable
AVX512_VBMI2        unavailable
AVX512_VNNI         unavailable
AVX512_VP2INTERSECT unavailable
AVX512_VPOPCNTDQ    unavailable
BMI                 unavailable
BMI2                unavailable
CID                 unavailable
CLDEMOTE            unavailable
CLFLUSH             present
CLFLUSHOPT          present
CLWB                unavailable
CMOV                present
CQM                 unavailable
CX16                present
CX8                 present
DCA                 unavailable
DE                  present
DS                  present
DSCPL               present
DTES64              present
ERMS                present
EST                 present
F16C                unavailable
FDP_EXCPTN_ONLY     unavailable
FLUSH_L1D           unavailable
FMA                 unavailable
FPU                 present
FSGSBASE            present
FXSR                present
GBPAGES             present
GFNI                unavailable
HLE                 unavailable
HT                  present
HYPERVISOR          unavailable
IA64                unavailable
INTEL_PT            present
INTEL_STIBP         unavailable
INVPCID             unavailable
LA57                unavailable
LM                  present
MCA                 present
MCE                 present
MD_CLEAR            unavailable
MMX                 present
MOVBE               present
MOVDIR64B           unavailable
MOVDIRI             unavailable
MP                  unavailable
MPX                 present
MSR                 present
MTRR                present
MWAIT               present
NX                  present
OSPKE               unavailable
OSXSAVE             present
PAE                 present
PAT                 present
PBE                 present
PCID                present
PCLMUL              present
PCONFIG             unavailable
PDCM                present
PGE                 present
PKU                 unavailable
PN                  unavailable
POPCNT              present
PSE                 present
PSE36               present
RDPID               unavailable
RDRND               present
RDSEED              present
RDTSCP              present
RDT_A               unavailable
RTM                 unavailable
SDBG                present
SELFSNOOP           present
SEP                 present
SHA                 unavailable
SMAP                present
SMEP                present
SMX                 unavailable
SPEC_CTRL           unavailable
SPEC_CTRL_SSBD      unavailable
SSE                 present
SSE2                present
SSE3                present
SSE4_1              present
SSE4_2              present
SSSE3               present
SYSCALL             present
TM2                 present
TME                 unavailable
TSC                 present
TSC_ADJUST          present
TSC_DEADLINE_TIMER  present
TSX_FORCE_ABORT     unavailable
UMIP                unavailable
VAES                unavailable
VME                 present
VMX                 present
VPCLMULQDQ          unavailable
WAITPKG             unavailable
X2APIC              present
XSAVE               present
XTPR                present
ZERO_FCS_FDS        present

AMD Stuff
─────────
3DNOW               unavailable
3DNOWEXT            unavailable
3DNOWPREFETCH       present
ABM                 present
BPEXT               unavailable
CMP_LEGACY          unavailable
CR8_LEGACY          unavailable
EXTAPIC             unavailable
FMA4                unavailable
FXSR_OPT            unavailable
IBS                 unavailable
LAHF_LM             present
LWP                 unavailable
MISALIGNSSE         unavailable
MMXEXT              unavailable
MWAITX              unavailable
NODEID_MSR          unavailable
OSVW                unavailable
OVERFLOW_RECOV      unavailable
PERFCTR_CORE        unavailable
PERFCTR_LLC         unavailable
PERFCTR_NB          unavailable
PTSC                unavailable
SKINIT              unavailable
SMCA                unavailable
SSE4A               unavailable
SUCCOR              unavailable
SVM                 unavailable
TBM                 unavailable
TCE                 unavailable
TOPOEXT             unavailable
WDT                 unavailable
XOP                 unavailable
keen99 commented 11 months ago

similar result here.

keen@bee:~/llamafile$ ./llava-v1.5-7b-q4-server.llamafile --ftrace

...
FUN 2294950 2294966    154'993'012'634     576   &ggml_get_n_tasks
FUN 2294950 2294967    154'993'029'990     944     &ggml_metal_supported
FUN 2294950 2294967    154'993'032'040     960       &cosmo_once
FUN 2294950 2294966    154'993'032'230     576   &ggml_compute_forward.part.0
FUN 2294950 2294967    154'993'033'544     944     &ggml_cublas_loaded
FUN 2294950 2294950    154'993'022'194 -123'126'943'063'576                   &cosmo_once
FUN 2294950 2294950    154'993'108'048 -123'126'943'063'592                 &ggml_compute_forward_dup
FUN 2294950 2294965    154'993'024'067     944     &ggml_metal_supported
FUN 2294950 2294950    154'993'111'218 -123'126'943'063'304                   &ggml_fp32_to_fp16_row
FUN 2294950 2294966    154'993'034'450     944     &ggml_metal_supported
FUN 2294950 2294965    154'993'114'109     960       &cosmo_once
FUN 2294950 2294967    154'993'101'492     960       &cosmo_once
FUN 2294950 2294965    154'993'123'331     944     &ggml_cublas_loaded
FUN 2294950 2294965    154'993'126'456     960       &cosmo_once
FUN 2294950 2294966    154'993'120'953     960       &cosmo_once
FUN 2294950 2294950    154'993'132'658 -123'126'943'062'120                   &__oncrash
FUN 2294950 2294966    154'993'135'157     944     &ggml_cublas_loaded
FUN 2294950 2294967    154'993'126'560     944     &ggml_compute_forward_dup
FUN 2294950 2294950    154'993'135'148 -123'126'943'062'072                     &__errno_location
FUN 2294950 2294966    154'993'137'062     960       &cosmo_once
FUN 2294950 2294967    154'993'139'167   1'232       &ggml_fp32_to_fp16_row
FUN 2294950 2294965    154'993'131'725     944     &ggml_compute_forward_dup
FUN 2294950 2294950    154'993'139'486 -123'126'943'062'072                     &IsDebuggerPresent
FUN 2294950 2294965    154'993'146'275   1'232       &ggml_fp32_to_fp16_row
FUN 2294950 2294950    154'993'148'912 -123'126'943'060'968                       &__errno_location
FUN 2294950 2294966    154'993'140'142     944     &ggml_compute_forward_dup
FUN 2294950 2294950    154'993'151'746 -123'126'943'060'968                       &_pthread_block_cancelation
FUN 2294950 2294950    154'993'159'880 -123'126'943'060'968                       &__sys_openat
FUN 2294950 2294966    154'993'158'234   1'232       &ggml_fp32_to_fp16_row
Illegal instruction (core dumped)
keen@bee:~/llamafile$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Celeron(R) N5095 @ 2.00GHz
    CPU family:          6
    Model:               156
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            0
    CPU max MHz:         2900.0000
    CPU min MHz:         800.0000
    BogoMIPS:            3993.60
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc
                         a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss
                         ht tm pbe syscall nx rdtscp lm constant_tsc art arch_pe
                         rfmon pebs bts rep_good nopl xtopology nonstop_tsc cpui
                         d aperfmperf tsc_known_freq pni pclmulqdq dtes64 monito
                         r ds_cpl vmx est tm2 ssse3 sdbg cx16 xtpr pdcm sse4_1 s
                         se4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave
                         rdrand lahf_lm 3dnowprefetch cpuid_fault epb cat_l2 cdp
                         _l2 ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi
                         flexpriority ept vpid ept_ad fsgsbase tsc_adjust smep e
                         rms rdt_a rdseed smap clflushopt clwb intel_pt sha_ni x
                         saveopt xsavec xgetbv1 xsaves split_lock_detect dtherm
                         ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp
                         hwp_pkg_req umip waitpkg gfni rdpid movdiri movdir64b m
                         d_clear flush_l1d arch_capabilities
Virtualization features:
  Virtualization:        VT-x
Caches (sum of all):
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    1.5 MiB (1 instance)
  L3:                    4 MiB (1 instance)
NUMA:
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Mitigation; Clear CPU buffers; SMT disabled
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
                          and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer
                          sanitization
  Spectre v2:            Mitigation; Enhanced IBRS, IBPB conditional, RSB fillin
                         g, PBRSB-eIBRS Not affected
  Srbds:                 Vulnerable: No microcode
  Tsx async abort:       Not affected
yapishu commented 10 months ago

Plus one: $ ./mistral-7b-instruct-server.llamafile --ftrace

FUN 2712860 2712915     28'046'631'127     944     &ggml_metal_supported
FUN 2712860 2712860     28'046'620'106 -123'137'703'420'504               &ggml_compute_forward.part.0
FUN 2712860 2712860     28'046'727'728 -123'137'703'420'136                 &ggml_metal_supported
FUN 2712860 2712860     28'046'729'536 -123'137'703'420'120                   &cosmo_once
FUN 2712860 2712860     28'046'730'849 -123'137'703'420'136                 &ggml_cublas_loaded
FUN 2712860 2712860     28'046'732'110 -123'137'703'420'120                   &cosmo_once
FUN 2712860 2712916     28'046'631'282     576   &ggml_compute_forward.part.0
FUN 2712860 2712860     28'046'734'079 -123'137'703'420'136                 &ggml_compute_forward_dup
FUN 2712860 2712860     28'046'738'463 -123'137'703'419'848                   &ggml_fp32_to_fp16_row
FUN 2712860 2712916     28'046'738'226     944     &ggml_metal_supported
FUN 2712860 2712916     28'046'741'931     960       &cosmo_once
FUN 2712860 2712916     28'046'745'814     944     &ggml_cublas_loaded
FUN 2712860 2712916     28'046'749'080     960       &cosmo_once
FUN 2712860 2712916     28'046'750'874     944     &ggml_compute_forward_dup
FUN 2712860 2712916     28'046'754'283   1'232       &ggml_fp32_to_fp16_row
FUN 2712860 2712860     28'046'758'872 -123'137'703'418'152                   &__oncrash
FUN 2712860 2712860     28'046'762'429 -123'137'703'418'104                     &__errno_location
FUN 2712860 2712860     28'046'769'785 -123'137'703'418'104                     &IsDebuggerPresent
FUN 2712860 2712860     28'046'776'415 -123'137'703'417'000                       &__errno_location
FUN 2712860 2712860     28'046'782'505 -123'137'703'417'000                       &_pthread_block_cancelation
FUN 2712860 2712860     28'046'788'948 -123'137'703'417'000                       &__sys_openat
Illegal instruction (core dumped)
GenuineIntel Goldmont (Appliance Grade)

kX86CpuFamily       0x6
kX86CpuModel        0x5c

kX86CpuStepping     0x9
kX86CpuType         0
kX86CpuModelid      0xc
kX86CpuExtmodelid   0x5
kX86CpuFamilyid     0x6
kX86CpuExtfamilyid  0

TSC_AUX             0
 → core             0
 → node             0

Caches
──────
Level 1 data        6-way  24,576 byte cache w/    64 sets of 64 byte lines shared across 1 threads
Level 1 code        8-way  32,768 byte cache w/    64 sets of 64 byte lines shared across 1 threads
Level 2             16-way 1,048,576 byte cache w/ 1,024 sets of 64 byte lines shared across 4 threads

Features
────────
ACC                 present
ACPI                present
ADX                 unavailable
AES                 present
APIC                present
ARCH_CAPABILITIES   present
AVX                 unavailable
AVX2                unavailable
AVXVNNI             unavailable
AVX512BW            unavailable
AVX512CD            unavailable
AVX512DQ            unavailable
AVX512ER            unavailable
AVX512F             unavailable
AVX512IFMA          unavailable
AVX512PF            unavailable
AVX512VBMI          unavailable
AVX512VL            unavailable
AVX512_4FMAPS       unavailable
AVX512_4VNNIW       unavailable
AVX512_BF16         unavailable
AVX512_BITALG       unavailable
AVX512_VBMI2        unavailable
AVX512_VNNI         unavailable
AVX512_VP2INTERSECT unavailable
AVX512_VPOPCNTDQ    unavailable
BMI                 unavailable
BMI2                unavailable
CID                 unavailable
CLDEMOTE            unavailable
CLFLUSH             present
CLFLUSHOPT          present
CLWB                unavailable
CMOV                present
CQM                 unavailable
CX16                present
CX8                 present
DCA                 unavailable
DE                  present
DS                  present
DSCPL               present
DTES64              present
ERMS                present
EST                 present
F16C                unavailable
FDP_EXCPTN_ONLY     unavailable
FLUSH_L1D           unavailable
FMA                 unavailable
FPU                 present
FSGSBASE            present
FXSR                present
GBPAGES             present
GFNI                unavailable
HLE                 unavailable
HT                  present
HYPERVISOR          unavailable
IA64                unavailable
INTEL_PT            present
INTEL_STIBP         present
INVPCID             unavailable
LA57                unavailable
LM                  present
MCA                 present
MCE                 present
MD_CLEAR            present
MMX                 present
MOVBE               present
MOVDIR64B           unavailable
MOVDIRI             unavailable
MP                  unavailable
MPX                 present
MSR                 present
MTRR                present
MWAIT               present
NX                  present
OSPKE               unavailable
OSXSAVE             present
PAE                 present
PAT                 present
PBE                 present
PCID                unavailable
PCLMUL              present
PCONFIG             unavailable
PDCM                present
PGE                 present
PKU                 unavailable
PN                  unavailable
POPCNT              present
PSE                 present
PSE36               present
RDPID               unavailable
RDRND               present
RDSEED              present
RDTSCP              present
RDT_A               present
RTM                 unavailable
SDBG                present
SELFSNOOP           present
SEP                 present
SHA                 present
SMAP                present
SMEP                present
SMX                 unavailable
SPEC_CTRL           present
SPEC_CTRL_SSBD      unavailable
SSE                 present
SSE2                present
SSE3                present
SSE4_1              present
SSE4_2              present
SSSE3               present
SYSCALL             present
TM2                 present
TME                 unavailable
TSC                 present
TSC_ADJUST          present
TSC_DEADLINE_TIMER  present
TSX_FORCE_ABORT     unavailable
UMIP                unavailable
VAES                unavailable
VME                 present
VMX                 present
VPCLMULQDQ          unavailable
WAITPKG             unavailable
X2APIC              present
XSAVE               present
XTPR                present
ZERO_FCS_FDS        present

AMD Stuff
─────────
3DNOW               unavailable
3DNOWEXT            unavailable
3DNOWPREFETCH       present
ABM                 unavailable
BPEXT               unavailable
CMP_LEGACY          unavailable
CR8_LEGACY          unavailable
EXTAPIC             unavailable
FMA4                unavailable
FXSR_OPT            unavailable
IBS                 unavailable
LAHF_LM             present
LWP                 unavailable
MISALIGNSSE         unavailable
MMXEXT              unavailable
MWAITX              unavailable
NODEID_MSR          unavailable
OSVW                unavailable
OVERFLOW_RECOV      unavailable
PERFCTR_CORE        unavailable
PERFCTR_LLC         unavailable
PERFCTR_NB          unavailable
PTSC                unavailable
SKINIT              unavailable
SMCA                unavailable
SSE4A               unavailable
SUCCOR              unavailable
SVM                 unavailable
TBM                 unavailable
TCE                 unavailable
TOPOEXT             unavailable
WDT                 unavailable
XOP                 unavailable
yomajo commented 10 months ago

Same here: on Intel(R) Core(TM) i5 CPU M 460 @ 2.53GHz, OS: Ubuntu 22.04.3 LTS log ending when running executable: ./llava-v1.5-7b-q4-server.llamafile

...
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.12 MiB
llm_load_tensors: mem required  = 3891.36 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  = 1024.00 MiB, K (f16):  512.00 MiB, V (f16):  512.00 MiB
llama_build_graph: non-view tensors processed: 676/676
llama_new_context_with_model: compute buffer total size = 159.32 MiB
Illegal instruction (core dumped)
jart commented 10 months ago

Taking a closer look at the implementation, I think this is happening because ggml_compute_fp32_to_fp16() is declared as inline. The fix should be pretty simple.