baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology
https://huggingface.co/baichuan-inc
Apache License 2.0
4.08k stars 293 forks source link

微调baichuan2时提示no attribute named "future_mask" #39

Open CarolXh opened 1 year ago

CarolXh commented 1 year ago

我是用transformers的trainer类去做的微调训练,每次一到eval的步骤就会报错,信息如下: AttributeError: Caught AttributeError in replica 1 on device 1. Original Traceback (most recent call last): File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker output = module(*input, kwargs) File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/peft/peft_model.py", line 931, in forward return self.base_model( File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 94, in forward return self.model.forward(*args, kwargs) File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, *kwargs) File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 692, in forward outputs = self.model( File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/accelerate/hooks.py", line 165, in new_forward output = old_forward(*args, **kwargs) File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 404, in forward alibi_mask = self.get_alibi_mask(inputs_embeds, seq_length_with_past) File "/home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py", line 354, in get_alibi_mask mask = self.future_mask[ File "/home/uos/miniconda3/envs/llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'BaichuanModel' object has no attribute 'future_mask'

之后我又改用Llam-efficient-tuning用和调baichuan1一样的方法去调baichuan2,使用了deepspeed,同样是在eval步骤出错。报错: AttributeError: 'Parameter' object has no attribute 'ds_status' 求问是什么原因

mmmans commented 1 year ago

Seems training in eval mode. check /home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py, line 354. Maybe you should call model.train() before training

zwx980624 commented 1 year ago

同问:deepspeed调baichuan2同样是在eval步骤报错AttributeError: 'Parameter' object has no attribute 'ds_status' Traceback (most recent call last): File "main.py", line 525, in main(run_args) File "main.py", line 419, in main perplexity = evaluation(model, eval_dataloader) File "main.py", line 350, in evaluation outputs = model(batch) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1801, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Base/modeling_baichuan.py", line 697, in forward logits = self.lm_head(hidden_states) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Base/modeling_baichuan.py", line 508, in forward norm_weight = self.weight File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1605, in getattr return _parameters[name] File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 132, in getitem if param.ds_status == ZeroParamStatus.NOT_AVAILABLE: AttributeError: 'Parameter' object has no attribute 'ds_status'


补充: 看起来是和deepspeed的zero3实现不太兼容,详见https://github.com/microsoft/DeepSpeed/issues/1757 改成zero2就不报错了

CarolXh commented 1 year ago

Seems training in eval mode. check /home/uos/.cache/huggingface/modules/transformers_modules/Baichuan2-13B-Chat/modeling_baichuan.py, line 354. Maybe you should call model.train() before training

I use the official fine-tune script to tune my model. The script has called trainer.train() method and I encounter the problem while training. I set the training param eval-strategy=steps and arise the problem. At training steps it works well, while at evaluation steps it interrupts.

applepieiris commented 1 year ago

推理时也有这样的错误

applepieiris commented 1 year ago

推理时也有这样的错误

我觉得是包的版本问题,开发者最好是把requirements里面包的版本指定好

CarolXh commented 1 year ago

可能是的,我特地卸载了torch重装了requirements里要求的版本2.0.0,也不行

HarlynDN commented 1 year ago

同问:deepspeed调baichuan2同样是在eval步骤报错AttributeError: 'Parameter' object has no attribute 'ds_status' Traceback (most recent call last): File "main.py", line 525, in main(run_args) File "main.py", line 419, in main perplexity = evaluation(model, eval_dataloader) File "main.py", line 350, in evaluation outputs = model(batch) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn ret_val = func(args, kwargs) File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1801, in forward loss = self.module(*inputs, kwargs) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Base/modeling_baichuan.py", line 697, in forward logits = self.lm_head(hidden_states) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl result = forward_call(args, kwargs) File "/root/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Base/modeling_baichuan.py", line 508, in forward norm_weight = self.weight File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1605, in getattr return _parameters[name] File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/zero/parameter_offload.py", line 132, in getitem if param.ds_status == ZeroParamStatus.NOT_AVAILABLE: AttributeError: 'Parameter' object has no attribute 'ds_status'

补充: 看起来是和deepspeed的zero3实现不太兼容,详见microsoft/DeepSpeed#1757 改成zero2就不报错了

同样遇到该问题,DeepSpeed ZeRO3推理时报错,在使用Baichuan1时未出现问题

sun1092469590 commented 1 year ago

除了改成zero2,还有其他方法吗,zero2如果全参数训练的话需要的资源很大呀。

lcg0808 commented 12 months ago

推理时也有这样的错误

我觉得是包的版本问题,开发者最好是把requirements里面包的版本指定好

请问后续解决了这个问题吗?(更新包之类的好像没用?

calvinzhan commented 11 months ago

原因在于以下代码里self.weight = nn.Parameter(nn.functional.normalize(self.weight))把deepspeed stage3在parameter里生成的变量给干掉了。

第一版不做head的normalization就没问题。

class NormHead(nn.Module):
    def __init__(self, hidden_size, vocab_size, bias=False):
        super().__init__()
        self.weight = nn.Parameter(torch.empty((vocab_size, hidden_size)))
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        self.first_flag = True

    def forward(self, hidden_states):
        if self.training:
            norm_weight = nn.functional.normalize(self.weight)
        elif self.first_flag:
            self.first_flag = False
            self.weight = nn.Parameter(nn.functional.normalize(self.weight))
            norm_weight = self.weight
        else:
            norm_weight = self.weight
        return nn.functional.linear(hidden_states, norm_weight)
kimlee1874 commented 11 months ago

原因在于以下代码里self.weight = nn.Parameter(nn.functional.normalize(self.weight))把deepspeed stage3在parameter里生成的变量给干掉了。

第一版不做head的normalization就没问题。

class NormHead(nn.Module):
    def __init__(self, hidden_size, vocab_size, bias=False):
        super().__init__()
        self.weight = nn.Parameter(torch.empty((vocab_size, hidden_size)))
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        self.first_flag = True

    def forward(self, hidden_states):
        if self.training:
            norm_weight = nn.functional.normalize(self.weight)
        elif self.first_flag:
            self.first_flag = False
            self.weight = nn.Parameter(nn.functional.normalize(self.weight))
            norm_weight = self.weight
        else:
            norm_weight = self.weight
        return nn.functional.linear(hidden_states, norm_weight)

请问该怎么解决这个问题呢?

princewang1994 commented 4 months ago

原因在于以下代码里self.weight = nn.Parameter(nn.functional.normalize(self.weight))把deepspeed stage3在parameter里生成的变量给干掉了。 第一版不做head的normalization就没问题。

class NormHead(nn.Module):
    def __init__(self, hidden_size, vocab_size, bias=False):
        super().__init__()
        self.weight = nn.Parameter(torch.empty((vocab_size, hidden_size)))
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        self.first_flag = True

    def forward(self, hidden_states):
        if self.training:
            norm_weight = nn.functional.normalize(self.weight)
        elif self.first_flag:
            self.first_flag = False
            self.weight = nn.Parameter(nn.functional.normalize(self.weight))
            norm_weight = self.weight
        else:
            norm_weight = self.weight
        return nn.functional.linear(hidden_states, norm_weight)

请问该怎么解决这个问题呢?

遇到一样的问题,根据这个issue,这里只是为了加速才重新初始化的nn.Parameter(),改成下面的方式validation就过了:

class NormHead(nn.Module):
    def __init__(self, hidden_size, vocab_size, bias=False):
        super().__init__()
        self.weight = nn.Parameter(torch.empty((vocab_size, hidden_size)))
        nn.init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        # self.first_flag = True

    def forward(self, hidden_states):
        # if self.training:
        #     norm_weight = nn.functional.normalize(self.weight)
        # elif self.first_flag:
        #     self.first_flag = False
        #     self.weight = nn.Parameter(nn.functional.normalize(self.weight))
        #     norm_weight = self.weight
        # else:
        #     norm_weight = self.weight
        norm_weight = nn.functional.normalize(self.weight)
        return nn.functional.linear(hidden_states, norm_weight)