ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.72k stars 9.72k forks source link

[User] faild to find n_mult number from range 256, with n_ff = 3072 #2241

Closed vdg-github closed 1 year ago

vdg-github commented 1 year ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

Expected Behavior

i'm not sure if i should change this line to for n_mult in range(3000, 1, -1):

Current Behavior

model tried to convert https://huggingface.co/symanto/sn-xlm-roberta-base-snli-mnli-anli-xnli/tree/main

params: n_vocab:250000 n_embd:768 n_head:12 n_layer:12 failed to find n_mult number

when i change the range started from 3000:

root@jenkins-ddt:~/github/llama.cpp# ./quantize models/sn-xlm-roberta-base-snli-mnli-anli-xnli/ggml-model-f16.bin models/sn-xlm-roberta-base-snli-mnli-anli-xnli/ggml-model-q4_0.bin q4_0
main: build = 812 (1d16309)
main: quantizing 'models/sn-xlm-roberta-base-snli-mnli-anli-xnli/ggml-model-f16.bin' to 'models/sn-xlm-roberta-base-snli-mnli-anli-xnli/ggml-model-q4_0.bin' as Q4_0
llama.cpp: loading model from models/sn-xlm-roberta-base-snli-mnli-anli-xnli/ggml-model-f16.bin
llama.cpp: saving model to models/sn-xlm-roberta-base-snli-mnli-anli-xnli/ggml-model-q4_0.bin
llama_model_quantize_internal: model size  =     0.00 MB
llama_model_quantize_internal: quant size  =     0.00 MB

main: quantize time =   275.42 ms
main:    total time =   275.42 ms

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

$ lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   40 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              1
Core(s) per socket:              1
Socket(s):                       8
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           85
Model name:                      Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
Stepping:                        4
CPU MHz:                         2294.608
BogoMIPS:                        4589.21
Virtualization:                  VT-x
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        4 MiB
L3 cache:                        24.8 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT disabled
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT Host state unknown
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl cpuid tsc_known_freq pni pclmu
                                 lqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti ssbd ibrs ibpb tpr_shado
                                 w vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1
                                  xsaves pku ospke md_clear

$ uname -a

Linux jenkins-tdd 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
$ python3 --version
Python 3.10.11
$ make --version
GNU Make 4.2.1
Built for x86_64-pc-linux-gnu

$ g++ --version
g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Green-Sky commented 1 year ago
  1. it looks like you try to use a non llama model, you should look into using bert.cpp
  2. n_mult will be replaced with something better once gguf (new file format) rolls around
vdg-github commented 1 year ago

thank you for your reply

  1. it looks like you try to use a non llama model, you should look into using bert.cpp
  2. n_mult will be replaced with something better once gguf (new file format) rolls around