Closed hzjane closed 7 months ago
没太看懂这个用例欸。
[14880, 107485, 103929, 113272, 100178, 271, 18493]
这个已经是输出token了吧?
generate中生成每个step都是封装好的,对外是不暴露的,是怎么改单个step的输入的?单个step的输入长度默认为1,你这个单个step输入好像长度为5了。
Contributor
在这里进行修改,https://github.com/huggingface/transformers/blob/v4.36.2/src/transformers/generation/utils.py#L2576
model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
test = torch.tensor([[[103929, 113272, 100178, 271, 18493]]])
model_inputs['input_ids'] = test
outputs = self(
**model_inputs,
return_dict=True,
output_attentions=output_attentions,
output_hidden_states=output_hidden_states,
)
也就是说我把qwen原本一个个生成的token(分两次跑),在第二次一下子输入多个第一次生成的token,会得到不同的结果。在这次commit前没有问题, 原因是你们修改causal_mask导致现在不支持有n个token input了。但是hf的assitant_generate是会有同时输入n个tokens这种情况出现的,所以会导致出问题。目前测过llama,baichuan,chatglm都没这个问题。
还有看了llama的causal_mask的调用https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama/modeling_llama.py#L735, 应该在q_len > 1 的情况下就会使用。或许下面这个改法是最简单的改法。把使用causal_mask 的情况完善,不仅只有first init 的时候用到。
502c502
< if query.size(1) == key_size:
---
> if query.size(1) > 1:
505a506,508
> causal_mask = causal_mask[
> :, :, key.size(-2) - query.size(-2): key.size(-2), :key.size(-2)
> ]
改过之后的完整代码可以发我一份吗?我发现模型输出不稳定,想试一下有没有效果
In the recent Qwen1.5 release, its codebase has been integrated into the transformers
package, aligning with the established practices of the transformers
library. This integration signifies that the transformers
ecosystem is now designed to natively support Qwen1.5 models in most scenarios without any additional configuration, including the assistant_generate
functionality.
For more information about this integration, discussions on how to leverage Qwen1.5 within the transformers environment, and for updates on community feedback and enhancements, please visit the official Qwen1.5 GitHub repository at https://github.com/QwenLM/Qwen1.5.
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
当前行为 | Current Behavior
[14880, 107485, 103929, 113272, 100178, 271, 18493] 这是最新的qwen-14-chat在某个输入的情况下,直接调用.generate()依次单个生成的总共7个tokens,如果我在第三轮的input是[102939]时,手动把input设置成test(5个token),再去调用会生成和单次调用不同的结果. 271 不等于 3837,18493不等于100345.
期望行为 | Expected Behavior
预期行为生成的除了新的token外应该都相同,查看了qwen的modeling_qwen.py的commit history,发现是causal_mask的改动出了问题,这样会导致causal_mask 一直为None,如果有多个token同时input,又有kv_cache时得到的结果和预期结果不符合. 我改动了目前的判断条件,和加上causal_mask的截取后可以得到正常的output.
这是我的改动的diff
复现方法 | Steps To Reproduce
No response
运行环境 | Environment
备注 | Anything else?
No response