ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.11k stars 9.33k forks source link

Error converting new stablelm-2-12b-chat #6553

Closed bartowski1182 closed 5 months ago

bartowski1182 commented 5 months ago

Using version b2589

Attempting convert-hf-to-gguf.py on

https://huggingface.co/stabilityai/stablelm-2-12b-chat

Results in error:

Can not map tensor 'model.layers.0.self_attn.k_layernorm.norms.0.weight'

Galunid commented 5 months ago

The 12b parameter model is not supported yet, I'll take a look later today.

Galunid commented 5 months ago

It looks like they are using a per head layer norm in their implementation. It's not supported in llama.cpp afaik. I'm not planning to implement it, since the model doesn't seem that good.

MoonRide303 commented 5 months ago

Got exactly the same error, cannot make GGUF from it (using convert-hf-to-gguf.py).

Loading model: stablelm-2-12b-chat
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
gguf: Adding 100000 merge(s).
gguf: Setting special token type bos to 100257
gguf: Setting special token type eos to 100257
gguf: Setting special token type unk to 100257
gguf: Setting special token type pad to 100257
gguf: Setting chat_template to {% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = 'You are a helpful assistant.' %}{% endif %}{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in loop_messages %}{% if loop.index0 == 0 %}{{'<|im_start|>system
' + system_message + '<|im_end|>
'}}{% endif %}{{'<|im_start|>' + message['role'] + '
' + message['content'] + '<|im_end|>' + '
'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}
Exporting model to 'StableLM-2-12B-Chat-F16.gguf'
gguf: loading model part 'model-00001-of-00005.safetensors'
token_embd.weight, n_dims = 2, torch.bfloat16 --> float16
blk.0.attn_norm.bias, n_dims = 1, torch.bfloat16 --> float32
blk.0.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32
blk.0.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16
blk.0.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16
blk.0.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16
Can not map tensor 'model.layers.0.self_attn.k_layernorm.norms.0.weight'

@Galunid from https://stability.ai/news/introducing-stable-lm-2-12b - it doesn't look bad for its weight class: image

slaren commented 5 months ago

It looks like they are using a per head layer norm in their implementation. It's not supported in llama.cpp afaik.

This may be the same that command r plus does. llama.cpp supports it, but you have to be careful to reshape q/k to 3d before doing the norm, and to export the norm as f32.

IzzyHibbert commented 5 months ago

+1, same issue here, just doing exactly the same conversion attempt.

Screenshot 2024-04-11 alle 10 20 11

Edit : @Galunid , as reported already above, I also believe that this model is not that bad (but we need to try it first)

ashishdatta commented 5 months ago

@bartowski1182 @IzzyHibbert I added a PR with a working solution: https://github.com/ggerganov/llama.cpp/pull/6635. I've tested with 12B and 12B chat. Feel free to try, not you have to use this branch of the model for now: https://huggingface.co/stabilityai/stablelm-2-12b/tree/stack-per-head-qk-norm

@ggerganov Can you please assign this issue to me so I can track, thanks!

Galunid commented 5 months ago

closed in #6635