eole-nlp / eole

Open language modeling toolkit based on PyTorch
https://eole-nlp.github.io/eole
MIT License
53 stars 11 forks source link

Refacto convert HF #26

Closed funboarder13920 closed 1 week ago

funboarder13920 commented 3 months ago

Hello,

Lina told that the goal was to be able to easily convert the main HF models. The falcon model is going to be a bit challenging.

I took the initiative to re-factorize the hf convert to get a clear idea of what change the falcon model conversion will require.

In addition to the falcon support, I also plan to add tests if I can find a way to generate similar HF models but lightweight.

Is the goal to use the same method to load the different models ? If it is not the case then the refacto is not necessary

vince62s commented 3 months ago

you are very welcome to work on this. As a reminder there ia an old version (that used to work in onmt-py) of the falcon converter: https://github.com/eole-nlp/eole/blob/main/eole/bin/convert/convert_falcon.py same for T5, MPT, legacy llama (from meta, not the HF version), redpajama, xgen

For Falcon I started so it should not be so difficult to integrate in convert_HF. Bear in mind that Falcon 7B and 40B (not tried 180B) have different architectures. The stuff to look at are: the way query_key_value common weights need to be split (it differs depending models eg: Phi3 is not the same as Falcon) parallel_residual (Falcon was the reason I introduced this in onmt) along wih shared_layer_norm

it almost impossible to test with a mini model. the only way to check if eveyrhting is ok is to redo locally the convert and inference with a specific prompt for each model. it's pain.

vince62s commented 3 months ago

@funboarder13920 I found my code locally when I started to do it for Falcon, the key thing is here:

key_maps["FalconForCausalLM"] = {
    "layer_prefix": "transformer.h.",
    "decoder.embeddings.make_embedding.emb_luts.0.weight": "transformer.word_embeddings.weight",
    "decoder.layer_norm.weight": "transformer.ln_f.weight",
    "decoder.layer_norm.bias": "transformer.ln_f.bias",
    "generator.weight": "lm_head.weight",
    ".self_attn.linear_query.": (
        ".self_attention.query_key_value.",
        ".view(-1, heads // num_kv + 2, hidden_size // heads, hidden_size)[:, :-2].reshape(hidden_size, hidden_size)",  # noqa E501
    ),
    ".self_attn.linear_keys.": (
        ".self_attention.query_key_value.",
        ".view(-1, heads // num_kv + 2, hidden_size // heads, hidden_size)[:, [-2]].reshape(hidden_size // heads * num_kv, hidden_size)",  # noqa E501
    ),
    ".self_attn.linear_values.": (
        ".self_attention.query_key_value.",
        ".view(-1, heads // num_kv + 2, hidden_size // heads, hidden_size)[:, [-1]].reshape(hidden_size // heads * num_kv, hidden_size)",  # noqa E501
    ),
    ".self_attn.final_linear.": ".self_attention.dense.",
    ".feed_forward.w_1.": ".mlp.dense_h_to_4h.",
    ".feed_forward.w_2.": ".mlp.dense_4h_to_h.",
    ".layer_norm_1.weight": (".input_layernorm.weight", ".ln_attn.weight"),
    ".layer_norm_1.bias": (".input_layernorm.bias", ".ln_attn.bias"),
    ".layer_norm_res.weight": ".ln_mlp.weight",
    ".layer_norm_res.bias": ".ln_mlp.bias",
}