I think the verification process needs another condition. According to the prove in SpecInfer, when all candidates in one draft candidate path are accepted, we need to sample the sample_token from the logits of the last accepted node. Here, you still sample from the gtp and the last accepted token is also sampled from gtp, which may cause repetition in the output. Here is what I got from your example.
I think the verification process needs another condition. According to the prove in SpecInfer, when all candidates in one draft candidate path are accepted, we need to sample the
sample_token
from thelogits
of the last accepted node. Here, you still sample from thegtp
and the last accepted token is also sampled fromgtp
, which may cause repetition in the output. Here is what I got from your example.Please correct me if I misunderstood anything!