Closed bartowski1182 closed 5 months ago
The 12b parameter model is not supported yet, I'll take a look later today.
It looks like they are using a per head layer norm in their implementation. It's not supported in llama.cpp afaik. I'm not planning to implement it, since the model doesn't seem that good.
Got exactly the same error, cannot make GGUF from it (using convert-hf-to-gguf.py).
Loading model: stablelm-2-12b-chat
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
gguf: Adding 100000 merge(s).
gguf: Setting special token type bos to 100257
gguf: Setting special token type eos to 100257
gguf: Setting special token type unk to 100257
gguf: Setting special token type pad to 100257
gguf: Setting chat_template to {% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful assistant.' %}{% endif %}{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{{'<|im_start|>system
' + system_message + '<|im_end|>
'}}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
Exporting model to 'StableLM-2-12B-Chat-F16.gguf'
gguf: loading model part 'model-00001-of-00005.safetensors'
token_embd.weight, n_dims = 2, torch.bfloat16 --> float16
blk.0.attn_norm.bias, n_dims = 1, torch.bfloat16 --> float32
blk.0.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32
blk.0.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16
blk.0.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16
blk.0.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16
Can not map tensor 'model.layers.0.self_attn.k_layernorm.norms.0.weight'
@Galunid from https://stability.ai/news/introducing-stable-lm-2-12b - it doesn't look bad for its weight class:
It looks like they are using a per head layer norm in their implementation. It's not supported in llama.cpp afaik.
This may be the same that command r plus does. llama.cpp supports it, but you have to be careful to reshape q/k to 3d before doing the norm, and to export the norm as f32.
+1, same issue here, just doing exactly the same conversion attempt.
Edit : @Galunid , as reported already above, I also believe that this model is not that bad (but we need to try it first)
@bartowski1182 @IzzyHibbert I added a PR with a working solution: https://github.com/ggerganov/llama.cpp/pull/6635. I've tested with 12B and 12B chat. Feel free to try, not you have to use this branch of the model for now: https://huggingface.co/stabilityai/stablelm-2-12b/tree/stack-per-head-qk-norm
@ggerganov Can you please assign this issue to me so I can track, thanks!
closed in #6635
Using version b2589
Attempting convert-hf-to-gguf.py on
https://huggingface.co/stabilityai/stablelm-2-12b-chat
Results in error:
Can not map tensor 'model.layers.0.self_attn.k_layernorm.norms.0.weight'