git-cloner / llama2-lora-fine-tuning

llama2 finetuning with deepspeed and lora
MIT License
162 stars 14 forks source link

decoder输出长度是有限制吗? #11

Open MarsMeng1994 opened 11 months ago

MarsMeng1994 commented 11 months ago
parser.add_argument('--base_model', default="llama-2-7b-chat-hf/", type=str)
parser.add_argument('--lora_weights', default="tloen/alpaca-lora-7b", type=str,
                    help="If None, perform inference on the base model")
parser.add_argument('--load_8bit', default="True", type=bool,
                    help='only use CPU for inference')

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that thelegacy(previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, setlegacy=False. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 3/3 [00:15<00:00, 5.12s/it] Question: 给我写一个用户登录注册系统,前端用vue,后端用go,数据库用mysql设计,写出代码。 This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (2048). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.


little51 commented 11 months ago


MarsMeng1994 commented 11 months ago


但是,用decode超过2048就会乱生成,没检测到结束符就会一直生成,直接内存就爆了。 有达到最大长度自动停止的配置吗?

MarsMeng1994 commented 11 months ago



little51 commented 11 months ago


MarsMeng1994 commented 11 months ago


个人理解,history的长度也算在2048内,他只是拼接到当前的输入前面了。如果上一步超了,下一步也生成不出来吧 image