[Help] 模型在推理时是否存在记忆？

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

在输入同样的上下文和query的情况下（已固定所有随机数），如果之前问过别的问题，模型的回答就会不一样，这是因为模型因为上一次的提问改变了什么状态么？有什么办法可以让回答保持稳定么？

Expected Behavior

不管之前输入多少问题，在输入同样的query和同样的history的情况下（已固定所有随机数），大模型的输出保持一致

Steps To Reproduce

加载模型
固定随机数
构建同样的history
两次输入同样的query和history

Environment

- OS:Ubuntu 22.04
- Python:3.9
- Transformers:4.30.1
- PyTorch:2.0.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :CUDA 11.8

Anything else?

No response

THUDM / ChatGLM2-6B

[Help] 模型在推理时是否存在记忆？ #626

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?