OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch
https://opennmt.net/
MIT License
6.73k stars 2.25k forks source link

Support for Microsoft's Phi-2 model #2548

Closed vince62s closed 8 months ago

vince62s commented 8 months ago

Just a note on this PR to remember. Phi-2 from MSFT uses a rotary dim (32) which is different from the dim per head (2560/32=80) which makes things a lit bit awkward, rotary embeddings are applied only to the first 32 dimensions and beyond (from 33 to 80) it's just a plain copy.

NOTE 2: I am trying to build a generic converter convert_HF.py fo now compatible with Llama, Mistral, Phi, hope to include other filters and in the end have only a single converter.