[BUG] `Both max_new_tokens and max_length seem to have been set` warning trigger

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

from transformers import AutoTokenizer, AutoModel
model_path = "models/chatglm2-6b"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True).cuda()
model = model.eval()
prefix = 0
for response, history in model.stream_chat(tokenizer, "你好",max_new_tokens = 20,max_length=None):
    if response:
        print(response[prefix:],end="")
    prefix = len(response)

Output:

Message: 'Both `max_new_tokens` (=20) and `max_length`(=37) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)'
Arguments: (<class 'UserWarning'>,)
你好👋！我是人工智能助手 ChatGLM2-6B，很高兴见到

Expected Behavior

ignores the warning complaining about both max_new_tokens and max_length being set when generation_config.max_length is None

Steps To Reproduce

See Current Behavior

Environment

It's not about the environment.

Anything else?

https://huggingface.co/THUDM/chatglm2-6b/blob/main/modeling_chatglm.py#L1110 In modeling_chatglm.py: if not has_default_max_length: should be modified to if not has_default_max_length and generation_config.max_length is not None:

THUDM / ChatGLM2-6B