OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.21k stars 282 forks source link

Llama 3.1 support please? #1745

Closed BBC-Esq closed 3 weeks ago

BBC-Esq commented 1 month ago

Hello again, if you plan on supporting LLama3.1 please note that it requires a new category of ROPE scaling. Thanks!

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/blob/main/config.json

avan06 commented 1 month ago

I noticed that Llama-3.1 has a rope_scaling parameter set in config.json, and it has also changed the field name for type to rope_type. Moreover, the rope_scaling parameter in the previous Llama-3 was null.

  "rope_scaling": {
    "factor": 8.0,
    "low_freq_factor": 1.0,
    "high_freq_factor": 4.0,
    "original_max_position_embeddings": 8192,
    "rope_type": "llama3"
  },

This causes an error in the conversion process of "ctranslate2\converters\transformers.py" whenever it reads rope_scaling["type"]. However, this can be easily fixed by writing the code as follows:

rope_type = rope_scaling["rope_type"] if "rope_type" in rope_scaling else rope_scaling["type"]
rotary_scaling_type = _SUPPORTED_ROPE_SCALING.get(rope_type)

But this is not entirely sufficient because the RoPE scaling for Llama-3 has not been implemented. Thus, during the conversion, the following error occurs:

NotImplementedError: RoPE scaling type 'llama3' is not yet implemented. The following RoPE scaling types are currently supported: linear, su

I don't know how to implement it, but for those who are interested, you can refer to the implementation in transformers: https://github.com/huggingface/transformers/blob/1c122a46dc3c4448901f8d2f3018d9d58b846ba5/src/transformers/modeling_rope_utils.py#L298

BBC-Esq commented 1 month ago

Also, here's the proposed (WIP) implementation with llamacpp if it also helps:

https://github.com/ggerganov/llama.cpp/commit/b5e95468b1676e1e5c9d80d1eeeb26f542a38f42