Open zhiyuanyou opened 2 weeks ago
Hello,
Thanks for your wonderful work. I am doing some testing with your code. However, I found a very strange problem.
I want to print the weight shape of lm_head (https://github.com/X-PLUG/mPLUG-Owl/blob/main/mPLUG-Owl2/mplug_owl2/model/modeling_mplug_owl2.py#L220) with the following codes.
print("Before initializing lm_head: ", config.hidden_size, config.vocab_size) self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, bias=False) print("After initializing lm_head: ", config.hidden_size, config.vocab_size) print("weight shape: ", self.lm_head.weight.shape)
The results are:
Before initializing lm_head: 4096 32000 After initializing lm_head: 4096 32000 weight shape: torch.Size([0])
I just very confused why the output of lm_head.weight.shape is 0. I wonder whether you have some insights about this problem.
lm_head.weight.shape
Monitoring this parameter is very important for me. However, I just can not obtain such a parameter during training.
Thanks.
Are you using the zero-3 strategy to initialize the model? If so, the parameters may be offloaded.
Hello,
Thanks for your wonderful work. I am doing some testing with your code. However, I found a very strange problem.
I want to print the weight shape of lm_head (https://github.com/X-PLUG/mPLUG-Owl/blob/main/mPLUG-Owl2/mplug_owl2/model/modeling_mplug_owl2.py#L220) with the following codes.
The results are:
I just very confused why the output of
lm_head.weight.shape
is 0. I wonder whether you have some insights about this problem.Monitoring this parameter is very important for me. However, I just can not obtain such a parameter during training.
Thanks.