Closed funboarder13920 closed 1 week ago
you are very welcome to work on this. As a reminder there ia an old version (that used to work in onmt-py) of the falcon converter: https://github.com/eole-nlp/eole/blob/main/eole/bin/convert/convert_falcon.py same for T5, MPT, legacy llama (from meta, not the HF version), redpajama, xgen
For Falcon I started so it should not be so difficult to integrate in convert_HF. Bear in mind that Falcon 7B and 40B (not tried 180B) have different architectures. The stuff to look at are: the way query_key_value common weights need to be split (it differs depending models eg: Phi3 is not the same as Falcon) parallel_residual (Falcon was the reason I introduced this in onmt) along wih shared_layer_norm
it almost impossible to test with a mini model. the only way to check if eveyrhting is ok is to redo locally the convert and inference with a specific prompt for each model. it's pain.
@funboarder13920 I found my code locally when I started to do it for Falcon, the key thing is here:
key_maps["FalconForCausalLM"] = {
"layer_prefix": "transformer.h.",
"decoder.embeddings.make_embedding.emb_luts.0.weight": "transformer.word_embeddings.weight",
"decoder.layer_norm.weight": "transformer.ln_f.weight",
"decoder.layer_norm.bias": "transformer.ln_f.bias",
"generator.weight": "lm_head.weight",
".self_attn.linear_query.": (
".self_attention.query_key_value.",
".view(-1, heads // num_kv + 2, hidden_size // heads, hidden_size)[:, :-2].reshape(hidden_size, hidden_size)", # noqa E501
),
".self_attn.linear_keys.": (
".self_attention.query_key_value.",
".view(-1, heads // num_kv + 2, hidden_size // heads, hidden_size)[:, [-2]].reshape(hidden_size // heads * num_kv, hidden_size)", # noqa E501
),
".self_attn.linear_values.": (
".self_attention.query_key_value.",
".view(-1, heads // num_kv + 2, hidden_size // heads, hidden_size)[:, [-1]].reshape(hidden_size // heads * num_kv, hidden_size)", # noqa E501
),
".self_attn.final_linear.": ".self_attention.dense.",
".feed_forward.w_1.": ".mlp.dense_h_to_4h.",
".feed_forward.w_2.": ".mlp.dense_4h_to_h.",
".layer_norm_1.weight": (".input_layernorm.weight", ".ln_attn.weight"),
".layer_norm_1.bias": (".input_layernorm.bias", ".ln_attn.bias"),
".layer_norm_res.weight": ".ln_mlp.weight",
".layer_norm_res.bias": ".ln_mlp.bias",
}
Hello,
Lina told that the goal was to be able to easily convert the main HF models. The falcon model is going to be a bit challenging.
I took the initiative to re-factorize the hf convert to get a clear idea of what change the falcon model conversion will require.
In addition to the falcon support, I also plan to add tests if I can find a way to generate similar HF models but lightweight.
Is the goal to use the same method to load the different models ? If it is not the case then the refacto is not necessary