decoder输出长度是有限制吗？

MarsMeng1994 commented 11 months ago

parser.add_argument('--base_model', default="llama-2-7b-chat-hf/", type=str)
parser.add_argument('--lora_weights', default="tloen/alpaca-lora-7b", type=str,
                    help="If None, perform inference on the base model")
parser.add_argument('--load_8bit', default="True", type=bool,
                    help='only use CPU for inference')

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that thelegacy(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00, 5.12s/it] Question: 给我写一个用户登录注册系统，前端用vue，后端用go，数据库用mysql设计，写出代码。 This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.

请问输出长度是有限制吗？但是感觉2048是不是太短了，怎么能修改这个长度呢？

little51 commented 11 months ago

与微调的block_size参数有关，2048已够长了。如果需要更长的回答，可传入历史多轮对话

MarsMeng1994 commented 11 months ago

与微调的block_size参数有关，2048已够长了。如果需要更长的回答，可传入历史多轮对话

但是，用decode超过2048就会乱生成，没检测到结束符就会一直生成，直接内存就爆了。有达到最大长度自动停止的配置吗？

MarsMeng1994 commented 11 months ago

与微调的block_size参数有关，2048已够长了。如果需要更长的回答，可传入历史多轮对话

您说的传入历史是指将上一步没生成完的输出当做输入再送一遍吗？

little51 commented 11 months ago

有可能没有结束符，这个也没什么好办法，将上次输入、输出放到history入参里，本次提示词用“继续”，这个例子是用Llama-2-7b-chat微调，效果一般。后来我在原始模型上微调过一次，效果比这个好一些。https://github.com/git-cloner/Llama2-chinese

MarsMeng1994 commented 11 months ago

有可能没有结束符，这个也没什么好办法，将上次输入、输出放到history入参里，本次提示词用“继续”，这个例子是用Llama-2-7b-chat微调，效果一般。后来我在原始模型上微调过一次，效果比这个好一些。https://github.com/git-cloner/Llama2-chinese

个人理解，history的长度也算在2048内，他只是拼接到当前的输入前面了。如果上一步超了，下一步也生成不出来吧

git-cloner / llama2-lora-fine-tuning

decoder输出长度是有限制吗？ #11