from transformers import AutoTokenizer, AutoModel
model_path = "models/chatglm2-6b"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda()
model = model.eval()
prefix = 0
for response, history in model.stream_chat(tokenizer, "你好",max_new_tokens = 20,max_length=None):
if response:
print(response[prefix:],end="")
prefix = len(response)
Output:
Message: 'Both `max_new_tokens` (=20) and `max_length`(=37) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)'
Arguments: (<class 'UserWarning'>,)
你好👋!我是人工智能助手 ChatGLM2-6B,很高兴见到
Is there an existing issue for this?
Current Behavior
Output:
Expected Behavior
ignores the warning complaining about both
max_new_tokens
andmax_length
being set when generation_config.max_length is NoneSteps To Reproduce
See Current Behavior
Environment
Anything else?
https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#L1110 In modeling_chatglm.py:
if not has_default_max_length:
should be modified toif not has_default_max_length and generation_config.max_length is not None: