Closed Wangqi12138 closed 1 month ago
Hi!
It appears that the original author was using a quantized model. Unfortunately, most efficient kernels for GPTQ or AWQ use non-deterministic algorithms, such that the results may be slight different even when do_sample
is set to False
. In addition, transformers
does not accept temperature=0
in recent versions and it will ask you to use do_sample=False
.
However, it is unclear that whether your issue is the same with the one you have referenced. Please describe with more details, such as which model were you using and which framework were you using.
想问下这个问题解决了吗?
Originally posted by @Elissa0723 in https://github.com/QwenLM/Qwen/issues/1025#issuecomment-1960662030