What is the acceptance procedure in REST?

jivanph commented 9 months ago

Thank you so much for your contribution to the literature in decoding strategies.

After reading your paper with great attention, I noticed that in the 'Draft acceptance of REST' subsection of the paper you mention that you "check the correctness of the draft token" because you "adopt a similar acceptance strategy compared to the original speculative decoding".

But, from my understanding, in the original speculative decoding the acceptance procedure depends on computing a probability ratio between the large and small model predictions. Since there is no small model in the methodology you propose, I wanted to ask how do you proceed in regards to accepting or not proposed tokens. Also, is there a guarantee that the accepted tokens follow the same distribution as the original (large) model?

zhenyuhe00 commented 9 months ago

Thank you for your interest in our work.

As mentioned in Section 4.1 of our paper, we accept draft tokens by checking whether they match (in other words, are identical to) the "true" tokens sampled from the LLM, which ensures that the results of REST are identical to those generated by standard autoregressive generation (refer to L254 in utils.py or L268 in utils.py for code implementation).

We are sorry for the confusion and will revise the phrasing in the next version of the paper.

jivanph commented 9 months ago

Thank you so much for your response. It clarified things for me.

zhenyuhe00 commented 9 months ago

Welcome to reopen the issue or open another issue if you have any further questions.

FasterDecoding / REST

What is the acceptance procedure in REST? #5