Open drxmy opened 8 months ago
could you be more precise on the question, and which code file and model architecture you're referring to?
could you be more precise on the question, and which code file and model architecture you're referring to?
I trained a baichuan2 model(https://huggingface.co/baichuan-inc/Baichuan2-7B-Base) with Megatron-LM and want to convert it back to huggingface format.
I dont understand this part of the code when trying add support for baichuan2 https://github.com/epfLLM/Megatron-LLM/blob/1b06b129fa463b7bfce88ef49e2082f8df00c7fa/weights_conversion/utils/permute_qkv.py#L15C3-L18C82
I am trying to convert baichuan2-megatron to hf. When reading the code, i can not understand this part
Why head_dim//2? Really appreciate it if someone can explain this.