Closed dszpr closed 7 months ago
Yuan2.0是decoder-only结构,T5模型是encoder-decoder结构,与我们的模型结构不同。无法同时输入两个attention_mask。模型采用的causal参数,attn_mask是下三角矩阵,不需要额外输入attention_mask。decoder-only模型结构无法同时输入input_id和inputs_embeds。
谢谢您的回复!我根据OPT模型重新进行了修改,OPT模型同样是decoder-only的模型,修改后仍然会有 "/root/.cache/huggingface/modules/transformers_modules/yuan_hf_model.py", line 624, in _prepare_decoder_attention_mask_training micro_batch_size, seq_length = input_id.size() AttributeError: 'NoneType' object has no attribute 'size' 的错误。 随后我将源码中调用的_prepare_decoder_attention_mask_training方法改为调用_prepare_decoder_attention_mask方法,不需要用到input_id,随后程序可以正常运行。但是我不清楚这样修改是否合理?
@dszpr 请问这样修改后的结果是否是正确的?
谢谢您的回复! 修改后训练可以正常进行,但是loss不收敛,eval性能很差,forward代码如下: `
def forward(self, samples):
image = samples["image"]
with self.maybe_autocast():
image_embeds = self.ln_vision(self.visual_encoder(image))
image_atts = torch.ones(image_embeds.size()[:-1], dtype=torch.long).to(
image.device
)
query_tokens = self.query_tokens.expand(image_embeds.shape[0], -1, -1)
query_output = self.Qformer.bert(
query_embeds=query_tokens,
encoder_hidden_states=image_embeds,
encoder_attention_mask=image_atts,
return_dict=True,
)
inputs_opt = self.opt_proj(query_output.last_hidden_state)
atts_opt = torch.ones(inputs_opt.size()[:-1], dtype=torch.long).to(image.device)
self.opt_tokenizer.padding_side = "left"
text = [t + "\n" for t in samples["text_input"]]
input_tokens = self.opt_tokenizer(
text,
return_tensors="pt",
padding="longest",
truncation=True,
max_length=self.max_txt_len,
).to(image.device)
output_tokens = self.opt_tokenizer(
samples["text_output"],
padding="longest",
truncation=True,
max_length=self.max_txt_len,
return_tensors="pt",
).to(image.device)
batch_input_tokens_input_ids = []
batch_input_tokens_atts = []
batch_atts_opt = []
batch_inputs_opt = []
for b, n in enumerate(samples["n_answers"]):
batch_input_tokens_input_ids += [input_tokens.input_ids[b]] * n
batch_input_tokens_atts += [input_tokens.attention_mask[b]] * n
batch_atts_opt += [atts_opt[b]] * n
batch_inputs_opt += [inputs_opt[b]] * n
batch_input_tokens_input_ids = torch.stack(batch_input_tokens_input_ids, dim=0)
batch_input_tokens_atts = torch.stack(batch_input_tokens_atts, dim=0)
batch_atts_opt = torch.stack(batch_atts_opt, dim=0)
batch_inputs_opt = torch.stack(batch_inputs_opt, dim=0)
targets = output_tokens.input_ids.masked_fill(
output_tokens.input_ids == self.opt_tokenizer.pad_token_id, -100
)
inputs_embeds = self.opt_model.model.embed_tokens(batch_input_tokens_input_ids)
inputs_embeds = torch.cat([batch_inputs_opt, inputs_embeds], dim=1)
attention_mask = torch.cat([batch_atts_opt, batch_input_tokens_atts], dim=1)
with self.maybe_autocast():
outputs = self.opt_model(
inputs_embeds=inputs_embeds,
attention_mask=attention_mask,
return_dict=True,
labels=targets,
)
loss = outputs.loss
return {"loss": loss}
` 这里我没有改变量名,用的都是opt的变量名,事实上加载的是YUAN的模型 有一个疑问,在传入attention_mask时,在decoder-only的模型中,是否应该对target中的output_token做mask,而不是对input_token做mask @Shawn-IEITSystems @zhaoxudong01
target做mask是在loss计算时,仅计算目标生成的token对应的loss;attention_mask是对输入结构做的mask,在attention中计算时使用。在源2.0中,采用了flash_attention,mask是默认的下三角(casual mask)。 目前的代码中使用的是源2.0的huggingface版本?
是的,使用的是huggingface版本,非常感谢您,我把issue close了
Hi!我想使用使用YUAN2.0-2B替换BLIP2中的T5-XL BLIP2的结构是ViT-QFORMER-LLM,我预训练的任务是Vision-QA,即根据图片信息回答问题,在预训练时只训练QFORMER,ViT和LLM均冻结 使用YUAN2.0-2B替换BLIP2中的T5-XL后,加载YUAN2.0-2B的tokenizer和模型权重均正常,在最后生成output、计算loss时报错 使用T5作为LLM时的forward代码如下: ` def forward(self, samples): image = samples["image"]
使用YUAN2.0作为LLM的代码修改为: `def forward(self, samples): image = samples["image"]
报错信息为
Traceback (most recent call last): File "train.py", line 103, in <module> main() File "train.py", line 99, in main runner.train() File "/workspace/code/LAVIS/lavis/runners/runner_base.py", line 385, in train train_stats = self.train_epoch(cur_epoch) File "/workspace/code/LAVIS/lavis/runners/runner_base.py", line 452, in train_epoch return self.task.train_epoch( File "/workspace/code/LAVIS/lavis/tasks/base_task.py", line 116, in train_epoch return self._train_inner_loop( File "/workspace/code/LAVIS/lavis/tasks/base_task.py", line 226, in _train_inner_loop loss, loss_dict = self.train_step(model=model, samples=samples) File "/workspace/code/LAVIS/lavis/tasks/base_task.py", line 63, in train_step output = model(samples) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 1519, in forward else self._run_ddp_forward(*inputs, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward return self.module(*inputs, **kwargs) # type: ignore[index] File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/workspace/code/LAVIS/lavis/models/blip2_models/blip2_yuan.py", line 222, in forward outputs = self.t5_model( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/yuan_hf_model.py", line 937, in forward outputs = self.model( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/yuan_hf_model.py", line 718, in forward attention_mask, _ = self._prepare_decoder_attention_mask_training(input_ids, inputs_embeds, self.eod_token, reset_mask_flag, self.reset_attention_mask, self.reset_position_ids) File "/root/.cache/huggingface/modules/transformers_modules/yuan_hf_model.py", line 624, in _prepare_decoder_attention_mask_training micro_batch_size, seq_length = input_id.size() AttributeError: 'NoneType' object has no attribute 'size'
我注意到在T5模型生成output时需要传入attention_mask和decoder_attention_mask两个参数,而YUAN2.0只传入一个attention_mask参数,但是在源码中会调用_prepare_decoder_attention_mask_training这个方法去生成decoder_attention_mask,我需要在调用模型生成output的时候传入input_id吗?我尝试过传入input_id,但是报错提示我inputs_embeds和input_id不能同时传入。 请问能否帮忙看下该如何修改代码实现YUAN2.0替换T5?