LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
5.23k stars 360 forks source link

Japanese model issue: codepoints_from_utf8(word).size() > 0 #475

Closed Nabokov86 closed 1 year ago

Nabokov86 commented 1 year ago

I can't load the Japanese model "ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0" using the latest concedo version. It used to work before.

Identified as LLAMA model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling (scale:1.000, base:10000.0)
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf (version GGUF V2 (latest))
GGML_ASSERT: llama.cpp:2164: codepoints_from_utf8(word).size() > 0
Aborted (core dumped)

Since I am unable to build llama-cpp, I can't test whether this is an upstream issue or not.

Nabokov86 commented 1 year ago

Version 1.44.2:

Identified as LLAMA model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling (scale:1.000, base:10000.0)
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf (version GGUF V2 (latest))
llm_load_print_meta: format         = GGUF V2 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 45043
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 2048
llm_load_print_meta: n_embd         = 4096
llm_load_print_meta: n_head         = 32
llm_load_print_meta: n_head_kv      = 32
llm_load_print_meta: n_layer        = 32
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 1
llm_load_print_meta: f_norm_eps     = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: n_ff           = 11008
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 7B
llm_load_print_meta: model ftype    = unknown, may not work
llm_load_print_meta: model params   = 6.85 B
llm_load_print_meta: model size     = 6.77 GiB (8.50 BPW) 
llm_load_print_meta: general.name   = ELYZA-japanese-Llama-2-7b-fast-instruct
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.09 MB
llm_load_tensors: mem required  = 6937.00 MB (+ 1024.00 MB per state)
.................................................................................................
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size =  153.47 MB
Load Model OK: True
LostRuins commented 1 year ago

Does it work with 1.45.2?

Nabokov86 commented 1 year ago

Yes, it does. I think 1.45.2 is the latest working version.

LostRuins commented 1 year ago

will be fixed in the next version

LostRuins commented 1 year ago

It should be fixed now.

Nabokov86 commented 1 year ago

Still doesn't work for me. Although now it's a different error message.

llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf (version GGUF V2 (latest))
GGML_ASSERT_CONTINUE: llama.cpp:2241: codepoints_from_utf8(word).size() > 0
error loading model: invalid character
llama_load_model_from_file: failed to load model
gpttype_load_model: error: failed to load model '/media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf'
Load Model OK: False
Could not load model: /media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf
LostRuins commented 1 year ago

can you link me to the model?

Nabokov86 commented 1 year ago

mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-gguf

https://huggingface.co/mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-gguf

I'm using 8 bit.

LostRuins commented 1 year ago

Please try again in 1.47.2

Nabokov86 commented 1 year ago

It works now! Thanks!