QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Apache License 2.0
13.59k stars 1.11k forks source link

[BUG] <关于generate阶段是否支持embedding输入的问题> #1047

Closed AZYoung233 closed 7 months ago

AZYoung233 commented 8 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

您好,百忙中打扰了,我是一个LLM新人,我正在尝试将时间序列嵌入到LLM中。请问qwen在generate阶段使用embedding输入会报错是为什么呢?这似乎在LLama上不会出现这样的问题 。在训练阶段使用embedding似乎不会报错

outputs = self.model.generate(inputs_embeds = fusion_embedding, num_beams=1, do_sample = False, max_new_tokens = self.max_new_tokens)

File "/home/young/anaconda3/envs/LLM/lib/python3.10/site-packages/transformers/generation/utils.py", line 682, in _maybe_initialize_input_ids_for_generation raise ValueError("bos_token_id has to be defined when no input_ids are provided.") ValueError: bos_token_id has to be defined when no input_ids are provided.

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

jklj077 commented 8 months ago

目前这个代码不太支持,有点儿tricky,可以试试把bos_token设成<|endoftext|>之类的看看,不一定可行哈。可以等等下一个版本的代码。

jklj077 commented 7 months ago

Qwen1.5已发布,建议使用最新的模型和代码。

AZYoung233 commented 7 months ago

非常感谢您的回复,我会对其进行尝试

HJYao00 commented 6 months ago

请问你解决了吗

AZYoung233 commented 6 months ago

请问你解决了吗

解决了,手动添加一个官方支持的special token id即可

20191864218 commented 6 months ago

请问你解决了吗

解决了,手动添加一个官方支持的special token id即可

您好,请问您如何添加的special token id,在哪里添加的,我快被折磨疯了/(ㄒoㄒ)/~~,谢谢您的回复