Open shermansiu opened 10 months ago
Yes we have something in the works :)
Are there certain experiments/comparisons that you would be most interested in?
In terms of methods to compare:
Metrics to compare:
Hardware:
Thanks! We will definitely look into some of these
And FYI, LADE actually achieves a speed slowdown under the default settings on a RTX 3090 and the LADE parameters need to be adjusted to be less intense to get a mild speedup.
(The LADE authors know about this, as it was brought up by Joao Gante from the Huggingface staff and independently by another user on their GitHub repo)
Thank you for your excellent idea. However, I'd like to kindly point out that this concept may be very similar to 'Aggressive Decoding' https://arxiv.org/pdf/2106.04970.
Apoorv, do you have plans for a paper or a technical report for prompt lookup decoding?
I know you've indicated that people should cite your GitHub repo, but it would be nice to have something out there with more extensive experiments across a variety of datasets, models, model sizes, and hardware types (e.g. CPU/GPU, various types of GPUs). Moreover, it would be nice to have a side-by-side comparison between prompt lookup decoding and other similar methods.