报错信息:
D:\anaconda\Lib\site-packages\transformers\generation\utils.py:1133: UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.
warnings.warn(
Traceback (most recent call last):
File "d:\code\GLM3\output_attention.py", line 22, in
attention_weights = output.attentions
^^^^^^^^^^^^^^^^^
AttributeError: 'Tensor' object has no attribute 'attentions'
2.forward方法
modeling中的源码
def forward(
self,
input_ids: Optional[torch.Tensor] = None,
position_ids: Optional[torch.Tensor] = None,
attention_mask: Optional[torch.Tensor] = None,
past_key_values: Optional[Tuple[torch.FloatTensor]] = None,
inputs_embeds: Optional[torch.Tensor] = None,
labels: Optional[torch.Tensor] = None,
use_cache: Optional[bool] = None,
output_attentions: Optional[bool] = None,
output_hidden_states: Optional[bool] = None,
return_dict: Optional[bool] = None,
return_last_logit: Optional[bool] = False,
):
use_cache = use_cache if use_cache is not None else self.config.use_cache
return_dict = return_dict if return_dict is not None else self.config.use_return_dict
transformer_outputs = self.transformer(
input_ids=input_ids,
position_ids=position_ids,
attention_mask=attention_mask,
past_key_values=past_key_values,
inputs_embeds=inputs_embeds,
use_cache=use_cache,
output_hidden_states=output_hidden_states,
return_dict=return_dict,
)
hidden_states = transformer_outputs[0]
if return_last_logit:
hidden_states = hidden_states[-1:]
lm_logits = self.transformer.output_layer(hidden_states)
lm_logits = lm_logits.transpose(0, 1).contiguous()
loss = None
if labels is not None:
lm_logits = lm_logits.to(torch.float32)
# Shift so that tokens < n predict n
shift_logits = lm_logits[..., :-1, :].contiguous()
shift_labels = labels[..., 1:].contiguous()
# Flatten the tokens
loss_fct = CrossEntropyLoss(ignore_index=-100)
loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), shift_labels.view(-1))
lm_logits = lm_logits.to(hidden_states.dtype)
loss = loss.to(hidden_states.dtype)
if not return_dict:
output = (lm_logits,) + transformer_outputs[1:]
return ((loss,) + output) if loss is not None else output
return CausalLMOutputWithPast(
loss=loss,
logits=lm_logits,
past_key_values=transformer_outputs.past_key_values,
hidden_states=transformer_outputs.hidden_states,
attentions=transformer_outputs.attentions,
)
System Info / 系統信息
任务描述: 做大语言模型可解释性相关的工作,需要输出每一层的注意力信息,但不管是使用generate方法中的output_attention参数还是使用modeling文件里的forward方法,都输出失败了:
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
1.使用generate方法源码
报错信息: D:\anaconda\Lib\site-packages\transformers\generation\utils.py:1133: UserWarning: Using the model-agnostic default
attention_weights = output.attentions
^^^^^^^^^^^^^^^^^
AttributeError: 'Tensor' object has no attribute 'attentions'
max_length
(=20) to control the generation length. We recommend settingmax_new_tokens
to control the maximum length of the generation. warnings.warn( Traceback (most recent call last): File "d:\code\GLM3\output_attention.py", line 22, in2.forward方法 modeling中的源码
测试源码:
报错: 输出None
Expected behavior / 期待表现
希望能输出模型在生成过程中,每一层的注意力信息