OpenNMT / CTranslate2

Fast inference engine for Transformer models
https://opennmt.net/CTranslate2
MIT License
3.26k stars 287 forks source link

fix: implement llama3 RoPE scaling type and fix converter #1751

Closed ebraraktas closed 1 month ago

ebraraktas commented 2 months ago

Fixes #1745 .

Implementation is ported from tranformers

ebraraktas commented 2 months ago

While implementing this, I saw that rotary embedding layer is shared among layers of Llama3 and huggingface implementation is refactored to use that shared one. Maybe we can implement this feature in ctranslate2 to reduce memory usage.

minhthuc2502 commented 1 month ago

Hello, thank you for your PR. I will fix the CI soon and you can rebase on the master. What do you mean "rotary embedding layer is shared among layers of Llama3 and huggingface implementation". Can you add the link here?

ebraraktas commented 1 month ago

Thanks for the comment, I will rebase once you fix it. @minhthuc2502

For RoPE sharing:

minhthuc2502 commented 1 month ago

Ah I understand your point. Actually, we can keep the current architecture because It requires more changes than HF to do this.

minhthuc2502 commented 1 month ago

Could you rebase on the master branch? please

ebraraktas commented 1 month ago

As in this PR, no space left error occurred during docker step of the CI/CD. @minhthuc2502