Open 66RING opened 3 months ago
After reading your comment, I find this phenomenon very interesting, so I try it just now and find that the output logits of the draft model and target model are indeed inconsistent. However, as long as the seed is consistent, the results will be the same every time. Therefore, I feel that this may be an error(bug) in hardware or code implementation, but the error is very small. I implement it as follows:
Thank you both for your careful observation; these details are very helpful.
I suggest whether it is possible to change the comparison of two float values with '==' to the diff of them less than a certain number, for example 1e-6?
Thank you both for your careful observation; these details are very helpful.
I suggest whether it is possible to change the comparison of two float values with '==' to the diff of them less than a certain number, for example 1e-6?
that was a straightforward solution and I change ==
to torch.allclose
, which have a default diff of 1e-05+-1e-08, the unexpected reject still happen.
def allclose(input: Tensor, other: Tensor, rtol: _float = 1e-05, atol: _float = 1e-08, equal_nan: _bool = False) -> _bool: ...
Thank you both for your careful observation; these details are very helpful. I suggest whether it is possible to change the comparison of two float values with '==' to the diff of them less than a certain number, for example 1e-6?
that was a straightforward solution and I change
==
totorch.allclose
, which have a default diff of 1e-05+-1e-08, the unexpected reject still happen.def allclose(input: Tensor, other: Tensor, rtol: _float = 1e-05, atol: _float = 1e-08, equal_nan: _bool = False) -> _bool: ...
try it with "atol= 1e-03"? I think this is a hardware error.
In my opinion, the generation should be the same when draft model and target model is the same and temparature is 0.
But in this case, the output logits of draft model and target model have a bit difference. But the argmax result is the same.
THE QUESTION: why is the output logits difference when the draft and target is the same model.
reproduce:
p[:, prefix_len + i - 1, j] == q[:, prefix_len + i - 1, j]