epfLLM / Megatron-LLM

distributed trainer for LLMs
Other
504 stars 73 forks source link

One question about the permute function code in permute_qkv.py #89

Open drxmy opened 8 months ago

drxmy commented 8 months ago

I am trying to convert baichuan2-megatron to hf. When reading the code, i can not understand this part

def permute(x):
        if revert:
            return x.view(head_dim//2, 2, dim).transpose(0, 1).reshape(head_dim, dim)
        return x.view(2, head_dim//2, dim).transpose(0, 1).reshape(head_dim, dim)

Why head_dim//2? Really appreciate it if someone can explain this.

martinjaggi commented 6 months ago

could you be more precise on the question, and which code file and model architecture you're referring to?

drxmy commented 6 months ago

could you be more precise on the question, and which code file and model architecture you're referring to?

I trained a baichuan2 model(https://huggingface.co/baichuan-inc/Baichuan2-7B-Base) with Megatron-LM and want to convert it back to huggingface format.

I dont understand this part of the code when trying add support for baichuan2 https://github.com/epfLLM/Megatron-LLM/blob/1b06b129fa463b7bfce88ef49e2082f8df00c7fa/weights_conversion/utils/permute_qkv.py#L15C3-L18C82