Closed kurugai closed 11 months ago
I think tokenizer.model
was missing from the directory you converted from. Right now, some of those scripts just skip including vocabulary if the file isn't there without informing the user.
llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from models/7B/ggml-model-f16.gguf (version GGUF V1L����.llama_model_loader: - tensor 0: token_embd.weight f16 [ 4096, 46336, 1, 1 ]
The special characters at the GGUF version also look kind of weird. I'm pretty sure your main issue in the tokenizer.model
thing though.
dear KerfuffleV2
Thank you for your reply. As you mentioned, there was no tokenizer.model file in the model I was trying to make a gguf. But I checked that the tokenizer.json file is there. I'm sorry to keep asking questions, but can I ask you how to make tokenizer.model?
MODEL URL : https://huggingface.co/kfkas/Llama-2-ko-7b-Chat/tree/main
FILE LIST .gitattributes LICENSE README.md config.json generation_config.json pytorch_model-00001-of-00002.bin pytorch_model-00002-of-00002.bin pytorch_model.bin.index.json special_tokens_map.json tokenizer.json tokenizer_config.json
I'm sorry to keep asking questions, but can I ask you how to make tokenizer.model?
No need to apologize. I think because it's a Korean model that it uses a different tokenizer type than that script expects. From your link:
"Since Llama-2-Ko uses FastTokenizer provided by HF tokenizers NOT sentencepiece package, it is required to use use_fast=True option when initialize tokenizer."
I'm not an expert on this, but I think that may mean it uses a BPE tokenizer rather than SPM (which is typical for LLaMA models). I don't know if it will work, but you can try using the main convert.py
script with --vocabtype bpe
It's possible this model uses a type of tokenizer or configuration that llama.cpp doesn't currently support.
I'm not an expert on this, but I think that may mean it uses a BPE tokenizer rather than SPM
In tokenizer.json
it looks like it uses the BPE tokenizer:
...
"model": {
"type": "BPE",
"dropout": null,
...
I think I need the vocab.json file. However, there is an error because this file is not in this model folder.
E:\AI\llama.cpp>python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00001-of-00002.bin
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00001-of-00002.bin
Loading model file models\kfkas_Llama-2-ko-7b-Chat\pytorch_model-00002-of-00002.bin
params = Params(n_vocab=46336, n_embd=4096, n_mult=5504, n_layer=32, n_ctx=2048, n_ff=11008, n_head=32, n_head_kv=32, f_norm_eps=1e-05, f_rope_freq_base=None, f_rope_scale=None, ftype=None, path_model=WindowsPath('models/kfkas_Llama-2-ko-7b-Chat'))
Traceback (most recent call last):
File "E:\AI\llama.cpp\convert.py", line 1172, in
I think I need the vocab.json file. However, there is an error because this file is not in this model folder.
No the conversion script does this wrong, it should use the tokenizer.json
file if it exists.
I think this little script will work for extracting the vocab:
import json, sys
tokenizer = json.load(sys.stdin)
json.dump(tokenizer['model']['vocab'], sys.stdout)
It reads from standard input and writes to standard output so you'll need to do something like:
python blah.py < tokenizer.json > vocab.json
I've made progress with your continuous guidance. As you said, blah.I made a py script and ran it at the DOS prompt, so the vocab.json file was made well. Thank you.
And execute the command below to a.I checked that the file called gguf was also created well without errors! python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat
--- output ---- [289/291] Writing tensor blk.31.ffn_norm.weight | size 4096 | type F32 | T+ 23 [290/291] Writing tensor output_norm.weight | size 4096 | type F32 | T+ 23 [291/291] Writing tensor output.weight | size 46336 x 4096 | type F16 | T+ 23 Wrote a.gguf
--- a.gguf's info --- 2023-08-29 PM 09:31 13,713,148,992 a.gguf
But When running 'E:\AI\llama.cpp>main -m a.gguf', there is a problem that LLM has to generate arbitrary strings, but it cannot. I think the gguf file is well made, but it's weird.
--- output ----
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q8_0: 226 tensors
llm_load_print_meta: format = GGUF V1 (support until nov 2023)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 46336
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly Q8_0
llm_load_print_meta: model size = 6.86 B
llm_load_print_meta: general.name = LLaMA
llm_load_print_meta: BOS token = 1 ''
llm_load_print_meta: EOS token = 2 ''
llm_load_print_meta: UNK token = 0 '
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 | sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000 generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
<----- no generated string
What if you specify a prompt like:
main -m a.gguf -p "Why is the sky blue?"
import json, sys tokenizer = json.load(sys.stdin) json.dump(tokenizer['model']['vocab'], sys.stdout, ensure_ascii=False)
I added 'ensure_ascii=False' to json dump due to Korean Unicode display problem.
run : main -m a.gguf -p "Why is the sky blue?"
output :
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
<----- I waited for about 5 minutes, but no strings were generated.
^C E:\AI\llama.cpp>
If there was going to be output, you'd see it pretty quickly. This is almost certainly an issue with the vocabulary but I'm not knowledgeable enough to really fix it.
Just in case it's something to do with the ensure_ascii
thing or using redirection, you can try this alternative for converting the vocab:
import json
with open("tokenizer.json", "r", encoding="utf-8") as f:
tokenizer = json.load(f)
with open("vocab.json", "w", encoding="utf-8") as f:
json.dump(tokenizer['model']['vocab'], f)
I doubt it will make a difference though. If not, hopefully someone else will be able to help you.
dear KerfuffleV2
I created vocab.json with the modified code, but it is not generating the same string. I've been searching online, but it's not an easy fight :-) Thank you for helping me all day long.
I created vocab.json with the modified code, but it is not generating the same string.
Do you just mean the result is the same: no output? If so, unfortunately that's pretty much what I expected because I didn't expect the second version of the conversion script to really make a difference.
I don't think you're doing anything wrong, it just doesn't seem like llama.cpp currently supports that particular model.
I'd suggest keeping this issue open but editing it a bit to be something more like "Converting kfkas Llama-2-ko-7b-Chat to GGUF fails" or possibly create a different issue like "Please add support for kfkas llama-2-ko-7b-chat" and link here for context.
Do you just mean the result is the same: no output?
yes. The same string was not generated.
As you said, I revised the title of this issue and registered a new issue. Thank you for your advice.^^
I could reproduce this on the original Llama 2 with --vocabtype bpe
.
Note that the tokenizer.json
of the Llama 2 says type == BPE
although they indeed have tokenizer.model
and I confirmed Llama 2 gguf worked with tokenizer.model
(namely without --vocabtype bpe
):
(snip)
"model": {
"type": "BPE",
"dropout": null,
"unk_token": "<unk>",
"continuing_subword_prefix": null,
"end_of_word_suffix": null,
"fuse_unk": true,
"byte_fallback": true,
"vocab": {
"<unk>": 0,
(snip)
I downloaded Llama 2 files in models/Llama-2-7b-chat-hf
and then
# create vocab.json
$ cat models/Llama-2-7b-chat-hf/tokenizer.json | jq --ascii-output '.model.vocab' > models/Llama-2-7b-chat-hf/vocab.json
$ python convert.py models/Llama-2-7b-chat-hf --vocabtype bpe
$ ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf--verbose-prompt -n 128 -p "$(echo "<s>[INST] How are you? [/INST]")"
(snip)
llm_load_print_meta: format = GGUF V1 (support until nov 2023)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = SPM
llm_load_print_meta: n_vocab = 32000
llm_load_print_meta: n_merges = 0
llm_load_print_meta: n_ctx_train = 4096
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-06
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly F16
llm_load_print_meta: model size = 6.74 B
llm_load_print_meta: general.name = models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 13 '<0x0A>'
llm_load_tensors: ggml ctx size = 0.09 MB
llm_load_tensors: mem required = 12853.10 MB (+ 256.00 MB per state)
...................................................................................................
llama_new_context_with_model: kv self size = 256.00 MB
llama_new_context_with_model: compute buffer total size = 71.91 MB
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
main: prompt: '<s>[INST] How are you? [/INST]'
main: number of tokens in prompt = 18
1 -> ''
529 -> ''
29879 -> ''
24566 -> ''
29902 -> ''
3059 -> ''
29911 -> ''
29962 -> ''
1128 -> ''
526 -> ''
366 -> ''
29973 -> ''
518 -> ''
29914 -> ''
29902 -> ''
3059 -> ''
29911 -> ''
29962 -> ''
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0
[end of text]
I wonder if vocab type = SPM
is correct in this setting.
(Of course, I can do with tokenizer.model
in the case of the original Llama 2, but the model I want to try does not have tokenizer.model
)
I wonder if
vocab type = SPM
is correct in this setting.
No the conversion script should set the tokenizer model kv properly to gpt2
when the source model uses BPE tokenizer.
@KerfuffleV2
No the conversion script should set the tokenizer model kv properly to gpt2 when the source model uses BPE tokenizer.
Ahh, it seems like convert.py
just always sets it to llama
no matter what. I can fix it in #2842
@kurugai If you want to try what klosax suggested, find the line
self.gguf.add_tokenizer_model("llama")
In convert.py
and change it to this:
if isinstance(vocab, SentencePieceVocab):
self.gguf.add_tokenizer_model("llama")
elif isinstance(vocab, BpeVocab):
self.gguf.add_tokenizer_model("gpt2")
else:
raise ValueError(f'Unknown vocab type: Not BpeVocab or SentencePieceVocab')
@KerfuffleV2 I modified convert.py as follows.
#self.gguf.add_tokenizer_model("llama")
if isinstance(vocab, SentencePieceVocab):
self.gguf.add_tokenizer_model("llama")
elif isinstance(vocab, BpeVocab):
self.gguf.add_tokenizer_model("gpt2")
else:
raise ValueError(f'Unknown vocab type: Not BpeVocab or SentencePieceVocab')
And I made 'a.gguf' using the command below.
python convert.py --vocabtype bpe --outfile a.gguf .\models\kfkas_Llama-2-ko-7b-Chat
However, when executing the main command, the following error message was displayed during the model loading process.
main -m a.gguf -p "Why is the sky blue?"
................. (omission)
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type f16: 226 tensors
error loading model: cannot find tokenizer merges in model file
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'a.gguf'
main: error: unable to load model
Ahh, I forgot the version in master
doesn't handle merges. If you're comfortable with testing a pull, you can try checking out #2842 and using that (you'll need to install the GGUF package from that pull as well).
Unless you're really impatient, your best bet is probably to just wait until that pull gets merged. That will hopefully fix this issue.
edit: Just want to add that I'd be really happy for people to test those changes. So if you do want to try it but need to ask some questions first, that's no problem. Don't be afraid of bothering me, it's up to whether you feel like going through the trouble or not.
@KerfuffleV2
Thank you for your feedback. First of all, I'm not used to pull tests. I'll wait until it merges. The day of the merger!! I'll check right away. Thank you for letting me know your sincerity.
I ran make clean
and make
after checked out to KerfuffleV2/feat-scripts-improvements.
I got:
$ python convert.py models/Llama-2-7b-chat-hf --vocabtype bpe
Traceback (most recent call last):
File "/Users/xxxx/projects/ggerganov/llama.cpp/convert.py", line 808, in <module>
class OutputFile:
File "/Users/xxxx/projects/ggerganov/llama.cpp/convert.py", line 859, in OutputFile
def add_meta_special_vocab(self, svocab: gguf.SpecialVocab) -> None:
^^^^^^^^^^^^^^^^^
AttributeError: module 'gguf' has no attribute 'SpecialVocab'
Do you have any idea to solve this?
Do you have any idea to solve this?
You need to install the gguf
Python package from that fork. Assuming you're already in a Python virtual environment you can do pip install --upgrade ./gguf-py
You might need to reactivate the environment also.
Thank you. I've totally forgotten about pip stuff.
I ran this in addition to https://github.com/ggerganov/llama.cpp/issues/2865#issuecomment-1697849098 (although without merges.txt
I got no error on convert.py
.):
$ cat models/Llama-2-7b-chat-hf/tokenizer.json | jq -r --ascii-output '.model.merges[]' > models/Llama-2-7b-chat-hf/merges.txt
And then:
$ ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf --verbose-prompt -n 128 -p "$(echo "<s>[INST] How are you? [/INST]")"
(snip)
ERROR: byte not found in vocab: '
'
zsh: segmentation fault ./main -m models/Llama-2-7b-chat-hf/ggml-model-f16.gguf --verbose-prompt -n
Any idea? 😭
Any idea?
You're just trying this with a normal LLaMA2 model not the one OP was testing, right? The only thing I can think of is it's because you're using a model that wasn't intended to use the BPE tokenizer mode. I'm not an expert on the tokenizer stuff so that idea might not be worth too much. I'm going to download OP's exact model and try it, if I get the same result as you then we'll know it's not because of what I mentioned.
edit: Your issue looks like #2889 so maybe it's just an issue with the BPE tokenizer and nothing you did. You could try loading the model you generated with #2842 using main
compiled from #2889 and see if that fixes your issue.
edit: So, I got OP's Korean model converted (it did require generating vocab.json
). This does need #2889 to avoid dying immediately. All the token contents still map to blank string strings because convert adds BPE vocab tokens as USER_DEFINED
but there's no case to handle converting those to string (there's a partial workaround in the comments for that pull).
Thank you for your reply. https://github.com/ggerganov/llama.cpp/pull/2889 should be exactly my issue.
Unfortunately, even with the change I suggested in the comments there it's still not really going to be correct. You'll see stuff like <0x20>
instead of spaces.
@KerfuffleV2
Hi. I think it's merged, so I installed a new package of llama.cpp and gguf and made 'a.gguf' in the same way as yesterday. The following error was displayed when running main, and the inference string was not generated.
Is it correct that the merger has been completed?
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
Below is the full log of the main command.
E:\AI\llama.cpp>main -m a.gguf -p "Why is the sky blue?"
Log start
main: build = 1128 (b532a69)
main: seed = 1693403947
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from a.gguf (version GGUF V2 (latest))
llama_model_loader: - tensor 0: token_embd.weight q8_0 [ 4096, 46336, 1, 1 ]
llama_model_loader: - tensor 1: blk.0.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 2: blk.0.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 3: blk.0.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 4: blk.0.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 5: blk.0.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 6: blk.0.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 7: blk.0.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 8: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 9: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 10: blk.1.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 11: blk.1.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 12: blk.1.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 13: blk.1.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 14: blk.1.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 15: blk.1.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 16: blk.1.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 17: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 18: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 19: blk.2.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 20: blk.2.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 21: blk.2.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 22: blk.2.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 23: blk.2.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 24: blk.2.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 25: blk.2.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 26: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 27: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 28: blk.3.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 29: blk.3.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 30: blk.3.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 31: blk.3.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 32: blk.3.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 33: blk.3.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 34: blk.3.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 35: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 36: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 37: blk.4.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 38: blk.4.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 39: blk.4.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 40: blk.4.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 41: blk.4.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 42: blk.4.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 43: blk.4.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 44: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 45: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 46: blk.5.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 47: blk.5.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 48: blk.5.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 49: blk.5.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 50: blk.5.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 51: blk.5.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 52: blk.5.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 53: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 54: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 55: blk.6.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 56: blk.6.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 57: blk.6.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 58: blk.6.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 59: blk.6.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 60: blk.6.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 61: blk.6.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 62: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 63: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 64: blk.7.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 65: blk.7.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 66: blk.7.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 67: blk.7.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 68: blk.7.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 69: blk.7.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 70: blk.7.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 71: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 72: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 73: blk.8.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 74: blk.8.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 75: blk.8.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 76: blk.8.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 77: blk.8.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 78: blk.8.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 79: blk.8.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 80: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 81: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 82: blk.9.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 83: blk.9.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 84: blk.9.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 85: blk.9.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 86: blk.9.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 87: blk.9.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 88: blk.9.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 89: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 90: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 91: blk.10.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 92: blk.10.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 93: blk.10.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 94: blk.10.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 95: blk.10.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 96: blk.10.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 97: blk.10.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 98: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 99: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 100: blk.11.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 101: blk.11.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 102: blk.11.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 103: blk.11.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 104: blk.11.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 105: blk.11.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 106: blk.11.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 107: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 108: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 109: blk.12.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 110: blk.12.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 111: blk.12.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 112: blk.12.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 113: blk.12.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 114: blk.12.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 115: blk.12.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 116: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 117: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 118: blk.13.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 119: blk.13.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 120: blk.13.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 121: blk.13.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 122: blk.13.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 123: blk.13.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 124: blk.13.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 125: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 126: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 127: blk.14.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 128: blk.14.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 129: blk.14.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 130: blk.14.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 131: blk.14.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 132: blk.14.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 133: blk.14.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 134: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 135: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 136: blk.15.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 137: blk.15.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 138: blk.15.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 139: blk.15.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 140: blk.15.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 141: blk.15.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 142: blk.15.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 143: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 144: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 145: blk.16.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 146: blk.16.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 147: blk.16.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 148: blk.16.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 149: blk.16.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 150: blk.16.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 151: blk.16.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 152: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 153: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 154: blk.17.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 155: blk.17.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 156: blk.17.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 157: blk.17.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 158: blk.17.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 159: blk.17.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 160: blk.17.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 161: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 162: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 163: blk.18.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 164: blk.18.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 165: blk.18.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 166: blk.18.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 167: blk.18.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 168: blk.18.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 169: blk.18.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 170: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 171: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 172: blk.19.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 173: blk.19.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 174: blk.19.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 175: blk.19.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 176: blk.19.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 177: blk.19.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 178: blk.19.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 179: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 180: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 181: blk.20.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 182: blk.20.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 183: blk.20.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 184: blk.20.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 185: blk.20.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 186: blk.20.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 187: blk.20.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 188: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 189: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 190: blk.21.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 191: blk.21.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 192: blk.21.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 193: blk.21.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 194: blk.21.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 195: blk.21.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 196: blk.21.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 197: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 198: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 199: blk.22.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 200: blk.22.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 201: blk.22.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 202: blk.22.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 203: blk.22.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 204: blk.22.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 205: blk.22.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 206: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 207: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 208: blk.23.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 209: blk.23.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 210: blk.23.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 211: blk.23.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 212: blk.23.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 213: blk.23.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 214: blk.23.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 215: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 216: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 217: blk.24.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 218: blk.24.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 219: blk.24.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 220: blk.24.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 221: blk.24.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 222: blk.24.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 223: blk.24.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 224: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 225: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 226: blk.25.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 227: blk.25.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 228: blk.25.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 229: blk.25.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 230: blk.25.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 231: blk.25.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 232: blk.25.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 233: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 234: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 235: blk.26.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 236: blk.26.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 237: blk.26.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 238: blk.26.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 239: blk.26.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 240: blk.26.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 241: blk.26.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 242: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 243: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 244: blk.27.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 245: blk.27.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 246: blk.27.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 247: blk.27.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 248: blk.27.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 249: blk.27.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 250: blk.27.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 251: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 252: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 253: blk.28.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 254: blk.28.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 255: blk.28.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 256: blk.28.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 257: blk.28.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 258: blk.28.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 259: blk.28.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 260: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 261: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 262: blk.29.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 263: blk.29.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 264: blk.29.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 265: blk.29.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 266: blk.29.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 267: blk.29.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 268: blk.29.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 269: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 270: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 271: blk.30.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 272: blk.30.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 273: blk.30.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 274: blk.30.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 275: blk.30.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 276: blk.30.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 277: blk.30.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 278: blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 279: blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 280: blk.31.attn_q.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 281: blk.31.attn_k.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 282: blk.31.attn_v.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 283: blk.31.attn_output.weight q8_0 [ 4096, 4096, 1, 1 ]
llama_model_loader: - tensor 284: blk.31.ffn_gate.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 285: blk.31.ffn_down.weight q8_0 [ 11008, 4096, 1, 1 ]
llama_model_loader: - tensor 286: blk.31.ffn_up.weight q8_0 [ 4096, 11008, 1, 1 ]
llama_model_loader: - tensor 287: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 288: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 289: output_norm.weight f32 [ 4096, 1, 1, 1 ]
llama_model_loader: - tensor 290: output.weight q8_0 [ 4096, 46336, 1, 1 ]
llama_model_loader: - kv 0: general.architecture str
llama_model_loader: - kv 1: general.name str
llama_model_loader: - kv 2: llama.context_length u32
llama_model_loader: - kv 3: llama.embedding_length u32
llama_model_loader: - kv 4: llama.block_count u32
llama_model_loader: - kv 5: llama.feed_forward_length u32
llama_model_loader: - kv 6: llama.rope.dimension_count u32
llama_model_loader: - kv 7: llama.attention.head_count u32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32
llama_model_loader: - kv 10: general.file_type u32
llama_model_loader: - kv 11: tokenizer.ggml.model str
llama_model_loader: - kv 12: tokenizer.ggml.tokens arr
llama_model_loader: - kv 13: tokenizer.ggml.scores arr
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr
llama_model_loader: - kv 15: tokenizer.ggml.merges arr
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q8_0: 226 tensors
ERROR: byte not found in vocab: '
'
llm_load_print_meta: format = GGUF V2 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 46336
llm_load_print_meta: n_merges = 77738
llm_load_print_meta: n_ctx_train = 2048
llm_load_print_meta: n_ctx = 512
llm_load_print_meta: n_embd = 4096
llm_load_print_meta: n_head = 32
llm_load_print_meta: n_head_kv = 32
llm_load_print_meta: n_layer = 32
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_gqa = 1
llm_load_print_meta: f_norm_eps = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff = 11008
llm_load_print_meta: freq_base = 10000.0
llm_load_print_meta: freq_scale = 1
llm_load_print_meta: model type = 7B
llm_load_print_meta: model ftype = mostly Q8_0
llm_load_print_meta: model size = 6.86 B
llm_load_print_meta: general.name = models
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token = 0 '<unk>'
llm_load_tensors: ggml ctx size = 0.09 MB
llm_load_tensors: mem required = 6947.73 MB (+ 256.00 MB per state)
.................................................................................................
llama_new_context_with_model: kv self size = 256.00 MB
llama_new_context_with_model: compute buffer total size = 99.97 MB
system_info: n_threads = 8 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: ' '
ERROR: byte not found in vocab: '
'
ERROR: byte not found in vocab: '
'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
Is it correct that the merger has been completed?
Yes, it got merged today. Unfortunately, that wasn't enough to fix models using BPE (like this one). Look a little bit higher in the thread, I linked to a pull with a fix for the "byte not found thing". However, even with that change the content of all the tokens is still blank. There's a partial fix in the comments, but there are still problems.
The good news it seems like people are aware of at least some of the problems and they're being looked at/worked on.
@KerfuffleV2
The good news it seems like people are aware of at least some of the problems and they're being looked at/worked on.
Good news! I will try whenever there is a related source modification in the future. :)
@kurugai The byte not found in vocab
errors might have been solved as https://github.com/ggerganov/llama.cpp/pull/2889 had merged.
(I failed with this error though https://github.com/ggerganov/llama.cpp/issues/2965)
@KerfuffleV2 Successfully converted to convert.py at the site below. Thank you for your help in the meantime. I'll close this one. https://github.com/strutive07/llama.cpp/tree/convert_hf_vocab
Hi. I'm trying to convert the 'kfkas/Llama-2-ko-7b-Chat' model I received from huggingface on Windows 11 into a gguf file. So I tried to convert it to the command below.
C:\AI\llama.cpp>python convert-llama-hf-to-gguf.py .\models\kfkas_Llama-2-ko-7b-Chat 1
The conversion was successful, but when I tried to execute it, there was a problem that it couldn't be executed.
Can I ask you to review what should I do? Below are the results of the command execution.
I know you're busy, but please do it once.
C:\AI\llama.cpp>pip install gguf Defaulting to user installation because normal site-packages is not writeable Collecting gguf Obtaining dependency information for gguf from https://files.pythonhosted.org/packages/bb/16/83a1cb95d9ec85bc316a1986481325c257a4a9a024e12bace801898db14e/gguf-0.2.1-py3-none-any.whl.metadata Downloading gguf-0.2.1-py3-none-any.whl.metadata (1.9 kB) Requirement already satisfied: numpy>=1.17 in c:\users\hwyoo\appdata\roaming\python\python310\site-packages (from gguf) (1.23.5) Downloading gguf-0.2.1-py3-none-any.whl (8.1 kB) Installing collected packages: gguf Successfully installed gguf-0.2.1
C:\AI\llama.cpp>python convert-llama-hf-to-gguf.py .\models\kfkas_Llama-2-ko-7b-Chat 1 gguf: loading model kfkas_Llama-2-ko-7b-Chat gguf: found 2 model parts gguf: get model metadata gguf: get tokenizer metadata gguf: get special token ids gguf: get tensor metadata gguf: loading model part 'pytorch_model-00001-of-00002.bin' token_embd.weight, n_dims = 2, torch.float16 --> float16 blk.0.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.0.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.0.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.0.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.0.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.0.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.0.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.0.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.0.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.1.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.1.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.1.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.1.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.1.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.1.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.1.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.1.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.1.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.2.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.2.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.2.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.2.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.2.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.2.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.2.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.2.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.2.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.3.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.3.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.3.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.3.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.3.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.3.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.3.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.3.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.3.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.4.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.4.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.4.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.4.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.4.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.4.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.4.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.4.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.4.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.5.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.5.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.5.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.5.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.5.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.5.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.5.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.5.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.5.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.6.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.6.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.6.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.6.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.6.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.6.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.6.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.6.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.6.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.7.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.7.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.7.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.7.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.7.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.7.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.7.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.7.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.7.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.8.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.8.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.8.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.8.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.8.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.8.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.8.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.8.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.8.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.9.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.9.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.9.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.9.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.9.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.9.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.9.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.9.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.9.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.10.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.10.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.10.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.10.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.10.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.10.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.10.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.10.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.10.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.11.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.11.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.11.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.11.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.11.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.11.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.11.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.11.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.11.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.12.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.12.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.12.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.12.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.12.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.12.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.12.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.12.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.12.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.13.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.13.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.13.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.13.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.13.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.13.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.13.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.13.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.13.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.14.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.14.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.14.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.14.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.14.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.14.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.14.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.14.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.14.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.15.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.15.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.15.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.15.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.15.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.15.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.15.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.15.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.15.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.16.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.16.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.16.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.16.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.16.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.16.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.16.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.16.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.16.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.17.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.17.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.17.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.17.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.17.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.17.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.17.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.17.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.17.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.18.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.18.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.18.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.18.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.18.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.18.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.18.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.18.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.18.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.19.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.19.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.19.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.19.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.19.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.19.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.19.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.19.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.19.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.20.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.20.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.20.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.20.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.20.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.20.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.20.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.20.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.20.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.21.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.21.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.21.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.21.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.21.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.21.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.21.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.21.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.21.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.22.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.22.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.22.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.22.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.22.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.22.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.22.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.22.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.22.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.23.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.23.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.23.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.23.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.23.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 gguf: loading model part 'pytorch_model-00002-of-00002.bin' blk.23.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.23.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.23.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.23.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.24.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.24.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.24.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.24.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.24.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.24.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.24.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.24.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.24.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.25.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.25.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.25.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.25.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.25.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.25.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.25.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.25.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.25.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.26.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.26.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.26.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.26.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.26.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.26.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.26.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.26.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.26.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.27.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.27.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.27.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.27.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.27.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.27.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.27.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.27.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.27.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.28.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.28.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.28.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.28.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.28.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.28.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.28.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.28.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.28.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.29.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.29.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.29.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.29.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.29.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.29.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.29.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.29.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.29.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.30.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.30.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.30.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.30.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.30.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.30.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.30.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.30.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.30.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.31.attn_q.weight, n_dims = 2, torch.float16 --> float16 blk.31.attn_k.weight, n_dims = 2, torch.float16 --> float16 blk.31.attn_v.weight, n_dims = 2, torch.float16 --> float16 blk.31.attn_output.weight, n_dims = 2, torch.float16 --> float16 blk.31.ffn_gate.weight, n_dims = 2, torch.float16 --> float16 blk.31.ffn_down.weight, n_dims = 2, torch.float16 --> float16 blk.31.ffn_up.weight, n_dims = 2, torch.float16 --> float16 blk.31.attn_norm.weight, n_dims = 1, torch.float16 --> float32 blk.31.ffn_norm.weight, n_dims = 1, torch.float16 --> float32 output_norm.weight, n_dims = 1, torch.float16 --> float32 output.weight, n_dims = 2, torch.float16 --> float16 gguf: write header gguf: write metadata gguf: write tensors gguf: model successfully exported to '.\models\kfkas_Llama-2-ko-7b-Chat/ggml-model-f16.gguf'
C:\AI\llama.cpp>main main: build = 1100 (dd0dc36) main: seed = 1693289567 llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from models/7B/ggml-model-f16.gguf (version GGUF V1L����.llama_model_loader: - tensor 0: token_embd.weight f16 [ 4096, 46336, 1, 1 ] llama_model_loader: - tensor 1: blk.0.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 2: blk.0.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 3: blk.0.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 4: blk.0.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 5: blk.0.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 6: blk.0.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 7: blk.0.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 8: blk.0.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 9: blk.0.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 10: blk.1.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 11: blk.1.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 12: blk.1.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 13: blk.1.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 14: blk.1.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 15: blk.1.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 16: blk.1.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 17: blk.1.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 18: blk.1.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 19: blk.2.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 20: blk.2.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 21: blk.2.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 22: blk.2.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 23: blk.2.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 24: blk.2.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 25: blk.2.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 26: blk.2.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 27: blk.2.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 28: blk.3.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 29: blk.3.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 30: blk.3.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 31: blk.3.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 32: blk.3.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 33: blk.3.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 34: blk.3.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 35: blk.3.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 36: blk.3.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 37: blk.4.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 38: blk.4.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 39: blk.4.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 40: blk.4.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 41: blk.4.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 42: blk.4.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 43: blk.4.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 44: blk.4.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 45: blk.4.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 46: blk.5.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 47: blk.5.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 48: blk.5.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 49: blk.5.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 50: blk.5.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 51: blk.5.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 52: blk.5.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 53: blk.5.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 54: blk.5.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 55: blk.6.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 56: blk.6.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 57: blk.6.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 58: blk.6.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 59: blk.6.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 60: blk.6.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 61: blk.6.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 62: blk.6.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 63: blk.6.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 64: blk.7.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 65: blk.7.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 66: blk.7.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 67: blk.7.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 68: blk.7.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 69: blk.7.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 70: blk.7.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 71: blk.7.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 72: blk.7.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 73: blk.8.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 74: blk.8.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 75: blk.8.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 76: blk.8.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 77: blk.8.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 78: blk.8.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 79: blk.8.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 80: blk.8.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 81: blk.8.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 82: blk.9.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 83: blk.9.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 84: blk.9.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 85: blk.9.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 86: blk.9.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 87: blk.9.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 88: blk.9.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 89: blk.9.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 90: blk.9.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 91: blk.10.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 92: blk.10.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 93: blk.10.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 94: blk.10.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 95: blk.10.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 96: blk.10.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 97: blk.10.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 98: blk.10.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 99: blk.10.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 100: blk.11.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 101: blk.11.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 102: blk.11.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 103: blk.11.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 104: blk.11.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 105: blk.11.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 106: blk.11.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 107: blk.11.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 108: blk.11.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 109: blk.12.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 110: blk.12.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 111: blk.12.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 112: blk.12.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 113: blk.12.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 114: blk.12.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 115: blk.12.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 116: blk.12.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 117: blk.12.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 118: blk.13.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 119: blk.13.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 120: blk.13.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 121: blk.13.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 122: blk.13.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 123: blk.13.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 124: blk.13.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 125: blk.13.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 126: blk.13.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 127: blk.14.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 128: blk.14.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 129: blk.14.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 130: blk.14.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 131: blk.14.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 132: blk.14.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 133: blk.14.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 134: blk.14.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 135: blk.14.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 136: blk.15.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 137: blk.15.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 138: blk.15.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 139: blk.15.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 140: blk.15.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 141: blk.15.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 142: blk.15.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 143: blk.15.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 144: blk.15.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 145: blk.16.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 146: blk.16.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 147: blk.16.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 148: blk.16.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 149: blk.16.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 150: blk.16.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 151: blk.16.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 152: blk.16.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 153: blk.16.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 154: blk.17.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 155: blk.17.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 156: blk.17.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 157: blk.17.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 158: blk.17.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 159: blk.17.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 160: blk.17.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 161: blk.17.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 162: blk.17.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 163: blk.18.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 164: blk.18.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 165: blk.18.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 166: blk.18.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 167: blk.18.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 168: blk.18.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 169: blk.18.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 170: blk.18.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 171: blk.18.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 172: blk.19.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 173: blk.19.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 174: blk.19.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 175: blk.19.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 176: blk.19.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 177: blk.19.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 178: blk.19.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 179: blk.19.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 180: blk.19.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 181: blk.20.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 182: blk.20.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 183: blk.20.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 184: blk.20.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 185: blk.20.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 186: blk.20.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 187: blk.20.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 188: blk.20.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 189: blk.20.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 190: blk.21.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 191: blk.21.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 192: blk.21.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 193: blk.21.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 194: blk.21.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 195: blk.21.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 196: blk.21.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 197: blk.21.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 198: blk.21.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 199: blk.22.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 200: blk.22.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 201: blk.22.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 202: blk.22.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 203: blk.22.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 204: blk.22.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 205: blk.22.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 206: blk.22.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 207: blk.22.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 208: blk.23.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 209: blk.23.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 210: blk.23.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 211: blk.23.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 212: blk.23.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 213: blk.23.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 214: blk.23.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 215: blk.23.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 216: blk.23.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 217: blk.24.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 218: blk.24.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 219: blk.24.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 220: blk.24.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 221: blk.24.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 222: blk.24.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 223: blk.24.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 224: blk.24.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 225: blk.24.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 226: blk.25.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 227: blk.25.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 228: blk.25.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 229: blk.25.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 230: blk.25.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 231: blk.25.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 232: blk.25.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 233: blk.25.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 234: blk.25.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 235: blk.26.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 236: blk.26.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 237: blk.26.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 238: blk.26.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 239: blk.26.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 240: blk.26.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 241: blk.26.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 242: blk.26.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 243: blk.26.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 244: blk.27.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 245: blk.27.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 246: blk.27.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 247: blk.27.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 248: blk.27.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 249: blk.27.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 250: blk.27.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 251: blk.27.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 252: blk.27.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 253: blk.28.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 254: blk.28.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 255: blk.28.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 256: blk.28.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 257: blk.28.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 258: blk.28.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 259: blk.28.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 260: blk.28.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 261: blk.28.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 262: blk.29.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 263: blk.29.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 264: blk.29.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 265: blk.29.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 266: blk.29.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 267: blk.29.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 268: blk.29.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 269: blk.29.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 270: blk.29.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 271: blk.30.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 272: blk.30.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 273: blk.30.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 274: blk.30.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 275: blk.30.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 276: blk.30.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 277: blk.30.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 278: blk.30.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 279: blk.30.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 280: blk.31.attn_q.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 281: blk.31.attn_k.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 282: blk.31.attn_v.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 283: blk.31.attn_output.weight f16 [ 4096, 4096, 1, 1 ] llama_model_loader: - tensor 284: blk.31.ffn_gate.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 285: blk.31.ffn_down.weight f16 [ 11008, 4096, 1, 1 ] llama_model_loader: - tensor 286: blk.31.ffn_up.weight f16 [ 4096, 11008, 1, 1 ] llama_model_loader: - tensor 287: blk.31.attn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 288: blk.31.ffn_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 289: output_norm.weight f32 [ 4096, 1, 1, 1 ] llama_model_loader: - tensor 290: output.weight f16 [ 4096, 46336, 1, 1 ] llama_model_loader: - kv 0: general.architecture str llama_model_loader: - kv 1: general.name str llama_model_loader: - kv 2: general.source.hugginface.repository str llama_model_loader: - kv 3: llama.tensor_data_layout str llama_model_loader: - kv 4: llama.context_length u32 llama_model_loader: - kv 5: llama.embedding_length u32 llama_model_loader: - kv 6: llama.block_count u32 llama_model_loader: - kv 7: llama.feed_forward_length u32 llama_model_loader: - kv 8: llama.rope.dimension_count u32 llama_model_loader: - kv 9: llama.attention.head_count u32 llama_model_loader: - kv 10: llama.attention.head_count_kv u32 llama_model_loader: - kv 11: llama.attention.layer_norm_rms_epsilon f32 llama_model_loader: - kv 12: tokenizer.ggml.bos_token_id u32 llama_model_loader: - kv 13: tokenizer.ggml.eos_token_id u32 llama_model_loader: - kv 14: tokenizer.ggml.unknown_token_id u32 llama_model_loader: - type f32: 65 tensors llama_model_loader: - type f16: 226 tensors error loading model: key not found in model: tokenizer.ggml.tokens llama_load_model_from_file: failed to load model llama_init_from_gpt_params: error: failed to load model 'models/7B/ggml-model-f16.gguf' main: error: unable to load model