Closed Nabokov86 closed 1 year ago
Version 1.44.2:
Identified as LLAMA model: (ver 6)
Attempting to Load...
---
Using automatic RoPE scaling (scale:1.000, base:10000.0)
System Info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf (version GGUF V2 (latest))
llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 45043
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_ctx = 2048
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = unknown, may not work
llm_load_print_meta: model params = 6.85 B
llm_load_print_meta: model size = 6.77 GiB (8.50 BPW)
llm_load_print_meta: general.name = ELYZA-japanese-Llama-2-7b-fast-instruct
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.09 MB
llm_load_tensors: mem required = 6937.00 MB (+ 1024.00 MB per state)
.................................................................................................
llama_new_context_with_model: kv self size = 1024.00 MB
llama_new_context_with_model: compute buffer total size = 153.47 MB
Load Model OK: True
Does it work with 1.45.2?
Yes, it does. I think 1.45.2 is the latest working version.
will be fixed in the next version
It should be fixed now.
Still doesn't work for me. Although now it's a different error message.
llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf (version GGUF V2 (latest))
GGML_ASSERT_CONTINUE: llama.cpp:2241: codepoints_from_utf8(word).size() > 0
error loading model: invalid character
llama_load_model_from_file: failed to load model
gpttype_load_model: error: failed to load model '/media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf'
Load Model OK: False
Could not load model: /media/llm/ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0.gguf
can you link me to the model?
mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-gguf
https://huggingface.co/mmnga/ELYZA-japanese-Llama-2-7b-fast-instruct-gguf
I'm using 8 bit.
Please try again in 1.47.2
It works now! Thanks!
I can't load the Japanese model "ELYZA-japanese-Llama-2-7b-fast-instruct-q8_0" using the latest concedo version. It used to work before.
Since I am unable to build llama-cpp, I can't test whether this is an upstream issue or not.