X-PLUG / mPLUG-Owl

mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
https://www.modelscope.cn/studios/damo/mPLUG-Owl
MIT License
2.25k stars 170 forks source link

Can not obtain the lm_head.weight #242

Open zhiyuanyou opened 2 weeks ago

zhiyuanyou commented 2 weeks ago

Hello,

Thanks for your wonderful work. I am doing some testing with your code. However, I found a very strange problem.

I want to print the weight shape of lm_head (https://github.com/X-PLUG/mPLUG-Owl/blob/main/mPLUG-Owl2/mplug_owl2/model/modeling_mplug_owl2.py#L220) with the following codes.

        print("Before initializing lm_head: ", config.hidden_size, config.vocab_size)
        self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False)
        print("After initializing lm_head: ", config.hidden_size, config.vocab_size)
        print("weight shape: ", self.lm_head.weight.shape)

The results are:

Before initializing lm_head:  4096 32000
After initializing lm_head:  4096 32000 
weight shape:  torch.Size([0])

I just very confused why the output of lm_head.weight.shape is 0. I wonder whether you have some insights about this problem.

Monitoring this parameter is very important for me. However, I just can not obtain such a parameter during training.

Thanks.

LukeForeverYoung commented 4 days ago

Are you using the zero-3 strategy to initialize the model? If so, the parameters may be offloaded.